Data Fitting Exercises

Assessment

This exercise is part of your assessment. When you have finished, please save and turn in your .ipynb file. To save, use **File > Save and Checkpoint** in the Jupyter menu. Before saving and uploading, be sure to check that your notebook runs as expected from a fresh start (using **Kernel > Restart and Run All** from the Jupyter menu).

3. Non-Linear Least Squares

In Exercise 1, you used scipy.stats.linregress to perform a linear regression to some experimental kinetics data, modelled using the linear form of the Van't Hoff equation.

In Exercise 2, you learned how to solve the same problem by writing two functions: a model function, and an error function; then finding an optimised set of model parameters that minimise your errors, using scipy.optimize.minimize.

The second approach is not restricted to fitting linear equations, and can generally be used to fit any model function to some reference dataset. In this, third, exercise, you will use the same technique to fit a non-linear model describing some flash photolysis data.

Flash Photolysis

Flash photolysis is a technique for investigating fast photochemical reactions, by exposing the reactant to very brief and intense flashes of light, and then spectroscopically analysing the resulting products. The data collected give a measure of fluorescence intensity with respect to time. The time scale is usually of the order of picoseconds, while the measure of intensity is usually the photon count.

The equation that governs a fluorescence lifetime is a biexponential — it depends on two exponential terms, and for this reason cannot be converted to a linear form by taking logs.

\begin{equation} I \propto A_1\mathrm{e}^{-k_{1}t} + A_2\mathrm{e}^{-k_{2}t}\tag{1}. \end{equation}

The pre-exponential factors $A_1$ and $A_2$ govern the contribution of each exponential decay (green and red line) to the overall signal. $t$ is the time, and $k_{1}$ and $k_{2}$ are the respective radiative rate constants for each lifetime.

In a bi-exponential plot such as the one displayed in the Figure below, there are three key features. The blue curve is the flash or pump signal. It is a very brief flash from the light source and will always show up in the signal. The green curve is the exponential decay signal from a short fluorescence lifetime. The red curve is the exponential decay signal from a longer fluorescence lifetime.

<img src="figures/bi-exponential_plot.png", width=300>

The radiative rate constant for each component is related to the natural lifetime, $\tau_r$, by

\begin{equation}\tau_r = \frac{1}{k_r}\end{equation}

where the natural lifetime may be thought of as the average amount of time a molecule will remain in its excited state before it decays. By calculating $k_1$ and $k_2$, one will be able to determine the natural lifetime as

\begin{equation}k_r = k_1 + k_2\end{equation}

For an given experiment, the parameters $A_1$, $k_1$, $A_2$, and $k_2$ can be obtained by fitting Equation 1 to the experimental data. The fitted values of $k_1$ and $k_2$ can then be used to calculate the natural lifetime.

Performing a least-squares fit to flash photolysis data

The file data/flash_photolysis.dat, stored in a public folder on ScienceData, contains a set of experimental data from a flash photolysis experiment. This file looks like


Time (ps)  Signal (counts)

0.0  -0.002740138

0.2  0.001982640

0.4  0.014379181

0.6  0.010439134

0.8  -0.014530330

1.0  0.009097397

1.2  -0.000735803

1.4  0.011072823

1.6  -0.004858392

…
By copying and editing the appropriate code from Exercise 2, or writing it from scratch, read in the experimental data from [`data/flash_photolysis.dat`](https://sciencedata.dk/public/6e3ed434c0fa43df906ce2b6d1ba9fc6/chem_data_analysis_jupyter/data/flash_photolysis.dat) and plot it.

To skip the first line in the data file, you can use
`np.loadtxt('https://sciencedata.dk/public/6e3ed434c0fa43df906ce2b6d1ba9fc6/chem_data_analysis_jupyter/data/flash_photolysis.dat', skiprows=1)`
In [ ]:
 

Your plot should show an initial fast rise in intensity, from zero, followed by a slower decay curve, that is approximately exponential.

By copying and editing the appropriate code from Exercise 2, or writing it from scratch, define a function model_function() that takes a set of parameters $P$ and the time $t$ as inputs, and returns the intensity as a function of time, according to equation 1.

Test that your model function can approximate the experimental data by plotting the original data, and your model intensity using
$A_1=0.5$
$A_2=0.5$
$k_1=5\times10^{-4}$
$k_2=5\times10^{-4}$
In [ ]:
 
By copying and editing the appropriate code from Exercise 2, or writing it from scratch, write an error function called error_function() that returns the sum of squared errors between your predicted intensity and the experimental intensity.

Use the `minimize()` function from `scipy.optimize` to find the optimal parameter set to describe the experimental data.

Hints

  • Equation 1 only describes the fluorescence decay. It does not include a description of the initial increase in intensity. When you perform your fit, you can limit your fitting data to the decay portion by slicing from the 200th element onwards; e.g. data[200:].
  • If you do not specify any additional options for minimize, the optimisation algorithm can try any values for the model parameters. Because the intensity model contains terms that look like $\mathrm{e}^{-x}$, if the algorithm tries negative numbers for $k_1$ or $k_2$ this can produce a result too large for the computer to describe, and give an error. In reality, we know that $k_1$ and $k_2$ must be positive. We can include this information in the function minimisation by including an additional setting that places bounds on the possible values of $P$:
minimize(  , bounds=((0,1), (0,1), (0,np.inf), (0,np.inf)) )

The bounds argument takes a list of pairs of numbers, that specify the minimum and maximum allowed values to check. If $P$ is ordered as $(A_1, A_2, k_1, k_2)$, this example specifies that $A_1$ and $A_2$ must be between 0 and 1 (we know that the maximum intensity at $t=0$ is equal to 1), and that $k_1$ and $k_2$ must be between 0 and (positive) infinity.

  • The minimize() function works by trying different parameter values until the result of the test function falls below a certain convergence tolerance. The default convergence settings for minimize() are not small enough to guarantee that you find the optimal set of parameter values. You can specify a lower tolerance setting by using the tol keyword:
minimize( , tol=1e-20)
In [ ]:
 
Plot the experimental data and your fitted intensity function.

Using your fitted parameters, calculate the natural lifetime.
In [ ]: