Team:Grenoble/Projet/Modelling/Stochastic

From 2011.igem.org

Grenoble 2011, Mercuro-Coli iGEM

Modelling - Stochastic

With the model developed so far, we have studied the deterministic behavior of our system. All the parameters are precisely known and the solution obtained is always the same whatever the number of simulation is. However, in the switching area, the choice between one of the two states is randomly made by bacteria. We therefore need to adapt the model so as to take into account the randomness of the choice of the state.

Sensitivity to noise

We have started our study by simply using the deterministic model, which we have transformed into a stochastic differential equation system: the gradient of IPTG and the homogenous concentration of aTc are modeled by a normal distribution with, for standard deviation, a predetermined percentage of the distribution average.


Figure 1: Logarithmic gradient of IPTG and aTc repartition on the plate with deterministic modelling
Figure 2: Logarithmic gradient of IPTG and aTc repartition on the plate with random deterministic modelling
Figure 3: Concentration of both repressor on the plate with deterministic modelling.
Figure 4: Concentration of both repressor on the plate with random deterministic modelling.

As can be seen from the figures above, the presence of fluctuations affect the quality of the switch in the case of the random model. This is why it is important to take into account the stochastic aspect (random and probalistic studies) of the system. We have run the stochastic simulations by means of a commonly-used algorithm, the Gillespie algorithm, which we describe below.

Gillespie algorithm

During a chemical reaction, the molecules move at random in the medium obeying brownian motion, and reactions happen randomly in the medium. On a macroscopic scale, the reactions can be seen as deterministic, and the statistical properties are summarized by constants in classical ODEs.

Reactions in the cell happen at random

However, at a cell's scale, the influence of the randomness of the reactions is no longer negligible, especially for biosensors. For an efficient measurement biosensor systems must provide the expected precision of the measure.

In his algorithm Gillespie uses propensity theory to describe the behaviour of such a medium. Each reaction occuring in the cell, like synthesis or degradation of a protein, has a certain propensity (see it as the probability that the event occurs in a coming amount of time).

A random number is generated and sets the reaction to occur according to their propensities

The propensities are calculated at each time . Then, one random number is generated and its value will set the reaction that will occur for the current time . Of course, a reaction with a relatively high propensity will have more chance to occur than another one.

After this step the concentrations are changed according to the reaction, and the same process is repeated until the ending time of the simulation is reached.

This is a basic description of the algorithm, but following these steps we can get a curve of the evolution of a molecule in the cell that takes into account the randomness of reactions occuring in the cell.

Figure 1: Result of one Gillespie run on our model (TetR nb of molecules)

Of course, the calculation being partially based on a random selection of the reactions, each run of the algorithm produces a different curve. On the image below all the curves have been generated using the exact same initial conditions and parameters.

Figure 2: Result of several Gillespie runs on our model (TetR nb of molecules). Curves are all similar

However, the output depends on the parameters and equations of the system, and for each run the results are very similar (except if the probability density function is bi-modal of course. In the case of our system for example, at the interface each bacterium has a chance to switch into one way or another. In this case the output can be either around 0 (LacI pathway chosen) or higher (TetR way chosen)

Reference : Daniel T. Gillespie (1977). "Exact Stochastic Simulation of Coupled Chemical Reactions". The Journal of Physical Chemistry 81 (25): 2340–2361

Daniel T. Gillespie (1976). "A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions". Journal of Computational Physics 22 (4): 403–434

Our genetical network

  1. Propensity functions and the Gillespie Algorithm
  2. To simplify the computations, our stochastic model only describes the stochastic behaviour of the Toggle switch genetic network. The toggle switch is the core of our system, it is the most sensitive part of the network and sets the precision, the behaviour and the limits of our system.

    The propensity functions used in our models are derived from the ODEs we have already written for deterministic modelling :

    Chemical reaction Propensity
    $ \phi \longrightarrow TetR $ $ \frac{k_{pLac}P_{Lac total}}{1+(\frac{LacI}{1 + \frac{IPTG}{K_{LacI - IPTG}}})^{n_{n_pLac}}} $
    $ TetR \longrightarrow \phi$ $ \delta_{TetR} TetR $
    $ \phi \longrightarrow LacI $ $ \frac{k_{pTet}P_{Tet total}}{1+(\frac{TetR}{1 + \frac{aTc}{K_{TetR - aTc}}})^{n_{n_pTet}}} $
    $ LacI \longrightarrow \phi$ $ \delta_{LacI} LacI $

    The parameters are the same as those used in ODEs and can be found on the parameter page. In our Matlab code the propensities are computed at each time step in the file Stochastic_model.m.

  3. Runs and statistical properties
  4. In a Gillespie simulation, the information brought by a few instances of the Gillespie simulation is enough to get the general behaviour of the genetical system. However, it is not sufficient when the information that we want is the mean or the variance of the concentrations in each species.

    In this case a great number of runs is necessary to have a correct estimate of the expected values. This is why we had to write a Matlab code to iterate a great number of runs. This part of the code can be found in the file Main_gillespie.m.

    Once the runs are computed through Gillespie algorithm, we have to extract the information from them. In this purpose we wrote the Hist.m, test.m and Dynamicdistros.m files. These files are specific to our system, we hope they can give an idea of how to analyse the results obtained via our Gillespie code, but they are not as easily understandable as other files.

    Once the datasets are obtained we have to extract its statistical properties. Refer to the next section for more information.

IMPORTANT NOTE:

We tried to write a Matlab code that is as easily adaptable to any other system as possible. However, because of the lack of time and the great amount of work it requires, we could not build a completely generic MATLAB function handler for Gillespie simulations. We provide the source codes here of the Matlab stochastic scripts for our simulations and tried to comment them as much as possible. Note that, if you want to adapt our code to a completely different system, only the Stochastic_model.m and parameters.mat files need to be changed, but a good understanding of the whole code is necessary.

Stochastic Modelling allows us to take into account the random aspect of chemical reaction and also to perform stability study of our device.

Mean, standard deviation and statistical properties

To determine the number of bacteria and the minimal step for the IPTG gradient we get, we need to compute the mean and the standard deviation of each of the two species of the toggle switch.

$X_{LacI}$ and $X_{TetR}$ are here the matrices representing the two toggle switch states in each bacterium on the whole plate in time. On each point of the plate are wells containing a great number of bacteria. Each toggle state in each bacterium is a random variable. The X matrices are therefore matrices of $ nb_{cells} \times nb_{cell/well} \times time points$ random variables.

Figure 1: Design of the Matrices representing the TetR and LacI concentrations in each bacterium on the plate

Thanks to stochastic modelling we can obtain the mean and variance in each of the $nb_{wells}$ wells on the plate. (These values have to be estimated from the data we obtain with Gillespie, and as in biological results, we need an important dataset to make an analysis that is statistically reliable.)

Here the mean value of concentrations of LacI and TetR in all of the bacteria in each channel is computed and we expect to get the behaviour on the figure below.

Figure 1: Expected behaviour of the means of TetR and LacI over the plate

To design our final device we need to know the width of the interface between the two states of the toggle switch. We also need to know the number of bacteria needed in the wells to have a proper measurement.

The interface which will be the colored part of our plate will turn red when populations in way 1 and populations in way 2 are in presence on the same point on the plate. We then need to know the statistical properties of the $(X_{LacI} X_{TetR})_{well}$ random variable.

Figure 2: Expected behaviour of the means of $TetR \times LacI$ over the plate

($µ_{X_{LacI}X_{TetR}}(well)$ is of course not continuous but discrete, we just want to highlight the deviation problem caused by $σ_{X_{LacI}X_{TetR}}(well)$)

We want to know $µ_{X_{LacI}X_{TetR}}(well)$ and $σ_{X_{LacI}X_{TetR}}(well)$ to obtain respectively :

  1. The width of the "gaussian" function of $µ_{X_{LacI}X_{TetR}}(well)$ to set the minimal definition (the ΔIPTG between wells) of our final device
  2. The minimum number of bacteria we want in the wells.
  3. We have $nb_{cell/well}$ independant random variables with the same probability density function. According to central limit theorem, the mean of $X_{LacI}X_{TetR} = (X_{LacI}X_{TetR cell1} + X_{LacI}X_{TetR cell2} + ... + X_{LacI}X_{TetR celln}) / n$ is $µ_{X_{LacI}X_{TetR}}(well)$ and its standard deviation is $σ_{X_{LacI}X_{TetR}}(well)/ \sqrt{n}$. The width of the gaussian is therefore easily calculable ($µ_{X_{LacI}X_{TetR}}(well) = µ_{X_{LacI}}(well) + µ_{X_{TetR}}(well)$), but it's not that easy for the standard deviation $σ_{X_{LacI}X_{TetR}}(well)$

    If we consider $X_{LacI}$ and $X_{TetR}$ are two correlated random variables, with known mean and variance ($µ_{LacI}$, $µ_{TetR}$, $σ_{LacI}$, $σ_{TetR}$) $$ Var(X_{LacI}X_{TetR}) = E[(X_{LacI}X_{TetR} - E[(X_{LacI}X_{TetR}])^{2}] $$ $$ = E[(X_{LacI}X_{TetR})^{2} -2E[X_{LacI}X_{TetR}]X_{LacI}X_{TetR} + E[X_{LacI}X_{TetR}]^{2}] $$ $$ = E[(X_{LacI}X_{TetR})] -2(µ_{LacI}µ_{TetR})^{2} + µ_{LacI}µ_{TetR} $$

    and $E[(X_{LacI}X_{TetR})]$ is not reductible to function of $µ_{LacI}$ and $µ_{TetR}$. To obtain this value we then needed to create a composite random variable x3 wich will be calculated during each run of our MATLAB stochastic algorithm (see previous section). $$ x_{3}= x_{LacI} \times x_{TetR} $$ We can thus get the variance and mean of $X_{LacI}X_{TetR}$ ($X_{3}$) with our data. If we want to get an error inferior to 10% for example around the $µ_{X_{LacI}X_{TetR}}(well)$ curve, the number of bacteria n needed will be n so that: $$\frac{\sigma_{LacI \times TetR_{cell}}}{\mu_{LacI \times TetR_{cell}}\sqrt{n_{cells}}} \leq 10%$$ With such a precision we can then calculate the IPTG definition between the wells.

    This statistical approach of stochastic modelling allows us to get severals specificities of our device.