Team:Amsterdam/Labwork/Data Analysis


Data Analysis

One of the methods we employ to characterise our CryoBricks is to transform them into E. coli and investigate their specific growth rate (µ) at different temperatures. These specific growth rates are estimated from growth curve measurements obtained at those temperatures. Because of the quantity of data involved with reliably establishing such a curve, especially for multiple strains of E. coli equipped with different CryoBricks, we decided to develop an algorithm for faster and easier analysis. It was developed for MatLab, but can also be used with freeware equivalents like Octave. The most recently published version of the package containing the algorithm is available here.

First, the data is log transformed, to linearize the exponential growth phase. Because of this transformation, the µ will be the angle of the steepest section of the curve. For every window of four subsequent datapoints, a first degree polynomial (y = a * x + b) is fitted, and the window with the highest polynomial coefficient (a) is selected as the growth curve's log phase. The µ is equal to the value of this coefficient.

Some additional steps are taken; a simple outlier screen is performed for example. For more information we refer you to the code.

As mentioned before, each method has its pros and cons. Our algorithm, in its simplicity, is rather fast and reasonably effective. It is not hindered by constraints and requirements that need to be fulfilled for fitting. Because its selection only contains 4 datapoints, however, it can be sensitive to outliers that aren't filtered by the simple screen, or to measurement noise in general. Increasing the selection size, while decreasing outlier sensitivity, will increase the chance of the algorithm being forced into selecting datapoints that aren't part of the log phase. A selection window size of 4 seems to work best; this was determined by trial and error, by benchmarking some of the data by hand and by having the algorithm display its output visually to allow for easy detection of mistakes.

It is by no means foolproof, however. Particularly strange results are produced when the algorithm is ran on data with no real log phase to speak of, such as data that shows no growth which was measured at 4 and 8°C. Instead of recognizing random variation in such data as noise, like a human mind would, it will find the steepest section of noise and call it a log phase. Making the algorithm more 'intelligent' is time consuming, and may very well end up costing you more time than using the algorithm saves, not to mention the fact the more complex something becomes, the more likely something goes wrong with it.

In conclusion, we'd like to recommend everyone to think carefully about how they analyse their data. You will always benefit from selecting the right tool for the right job, and we feel that our algorithm is the right tool for analysing our growth data.