Team:NTNU Trondheim/Modeling

From 2011.igem.org

(Difference between revisions)
(Stochastic Differential equations)
(The Models)
 
(137 intermediate revisions not shown)
Line 2: Line 2:
 +
= Modeling =
 +
To describe and understand the biological reactions and processes as the bacteria turns red under stress, we developed multiple mathematical and statistical models. As the most basic model we used a system of ordinary differential equations (ODE). This is a fully deterministic model, describing the change in concentration for all molecules involved.
 +
Taking into account that the number of molecules in the cell, tends to be low and their movement random, one would have to use a model that allow randomness. This will result in a system of stochastic differential equations (SDE), which describe the underlying model from an observed pattern (random motion) in the number of molecules over time, given a concentration of ppGpp. A realization of such a model can be found under in figure in section for Results. Investigation of those models is done using Monte Carlo simulation in matlab and Dizzy.
 +
== Model introduction ==
-
= Modeling =
+
At the heart of the modeling lies biological consistency and data integration. The modeling will be focused on interpretation simplicity and data consistency. That is to develop models that can be easily interpreted by biologist and mathematicians, but also describe what is observed at the laboratory.  
-
To describe and understand the biological reactions and process, as the bacteria turns red under stress we developed multiple mathematical and statistical models. As the most basic  model we used a system om ordinary differential equations (ODE), this is a fully deterministic model, describing the change in concentration for all molecules involved. The process for a bacteria to go from normal to glowing red is involved, and one can think of this as a stochastic process, that at each step, the process can ether succeed or fail, with a given probability. This gives rise to a Bayesian model.
+
-
Combining these two models will result in a system of stochastic differential equations (SDE), which will be solved using numerical algorithms. As a last model we will explore the relationship between variants of stress and fluoricene intensity using (non)-conventional regression.
+
The two main ways to model biological systems are deterministic and stochastic. In this project we will attempt to approach the problems in both ways. Using a deterministic model, with fixed parameters, and a stochastic model to integrate data more dynamically.
-
== Model Introduction ==
+
=== Process description ===
-
At the heart of the modeling lies biological consistency and data integration. The modeling will be focused on interpretation simplicity and data consistency. That is to develop models that can be easily interpreted by biologist and mathematicians, but the models should also strive to describe that which is observed at the laboratory.
+
-
The two main ways to model biological systems. One way is deterministic, the other is stochastic. In this project we will attempt to approach to problems in both ways. Using a deterministic model, with fixed parameters, and a stochastic model to integrate data more dynamically.
+
As described in the [[Team:NTNU_Trondheim/Project|introduction]], ppGpp will repress the production of LacI and LacI represses the production of mCherry. These are the dominating processes. In short, when ppGpp are not present there will be little mCherry and when ppGpp are present the level of mCherry will be substantially higher.
 +
 
 +
In addition to these processes there are additional processes which might be of importance. First of all ppGpp affects the RNAP (RNA polymerase) and can therefore affect the production of mCherry as well as the production of LacI. As described in the introduction LacI is expected to be heavily downregulated by ppGpp. The production of mCherry can also in some cases lead to stress and therefore more ppGpp which in turn leads to more mCherry, in other words a positive feedback loop. These effects are assumed to be small, but might still affect the outcome.
 +
 
 +
[[File:ModelSetup.png]]
 +
 
 +
Figure 1:''The processes involved in our system''
== The Models==
== The Models==
-
Four basic models where constructed, which will be described below.
 
-
[[File:ModelOvervew.png]]
+
Two basic models where constructed, which will be described below.  
-
[[File:Eq1.png]]
+
 
 +
 
 +
 
=== Systems of ODE ===
=== Systems of ODE ===
-
Models based on Ordinary Differential Equations (ODEs) are one of the most used methods of describing genetic cicuits, the different processes and reactions taking place in the cell are described by a set of coupled differential equations. This method might give both qualitative and quantitative information about the system and can therefore be very useful. It is however dependent on accurate kinetic reaction parameters and in some cases one also has to take into account the stochastic nature of genetic circuits. The equations are then solved either in Dizzy or Matlab.
 
-
=== Bayesian Hierarchy  ===
+
Models based on Ordinary Differential Equations (ODEs) are one of the most used methods of describing genetic circuits, while the different processes and reactions taking place in the cell are described by a set of coupled differential equations. This method might give both qualitative and quantitative information about the system and can therefore be very useful. It is however dependent on accurate kinetic reaction parameters and in some cases one also has to take into account the stochastic nature of genetic circuits. The equations are then solved either in Dizzy or Matlab.
-
As shown in the [[Team:NTNU_Trondheim#Introduction|introduction]], the process before the bacteria turns red is a linear chain of events. This can be modeled as a stochastic process, and solved with Monte Carlo methods. This is a way of including stochastic elements and data integration. Setting the probability of a transition to be successful to be Beta distributed, with hyper parameters ("known constants"), &#x03B1;<sub>i</sub> and &#x03B2;<sub>i</sub>, the the probability of total success, that the start input will transcend and cause the output (observable), is the product of all the beta distributed variables, which itself is Beta distributed whit parameter  &#x03B1; =  &#x2211; &#x03B1;<sub>i</sub>-q+1 and &#x03B2; =  &#x2211; &#x03B2;<sub>i</sub>-q+1, where q is the number of parameters.
+
-
Having done some previous test at the lab, we wish to extend this in a probabilistic context. To integrate this data and the model assumptions, we can get a better picture of what is happening in the cell, and how likely it is to succeed  in each trail.  
+
====Basic Model ====
 +
 
 +
Based on the model above the most important processes are, the symbols to the left of the equation denotes the kinetic parameter for the process:
 +
 
 +
[[File:basic_processes.png]]
 +
 
 +
The first process describes ppGpp attaching to the RNA polymerase, '''RNA<sub>p</sub><sup>A</sup>''' denotes the active part which is not repressed by ppGpp. '''RNA<sub>p</sub><sup>R</sup>''' is the number of repressed RNA polymerase molecules. This process can also be reversed. The transcription of mRNA is described in one step. '''D<sub>0</sub><sup>1</sup>''' is the promoter determining the production of LacI mRNA and '''M<sup>1</sup>''' is LacI mRNA. The LacI mRNA in turn leads to production of LacI transcription factor denoted as '''TF'''. Both mRNA and LacI is degraded as well. LacI will then inhibit mCherry:
 +
 
 +
[[File:basic_processes2.png]]
 +
 
 +
[[File:basic_processes3.png]]
 +
 
 +
Where '''D<sub>1</sub><sup>2</sup>''' denotes promotors inhibited by LacI transcription factor. &xi; is a measure of how strongly ppGpp affects mCherry directly, if 1 it is unaffected if 0 it is affected as strongly as LacI.
 +
 
 +
====Steady State====
 +
 
 +
Since it is difficult to find accurate parameters for all the processes involved and since it is often only the concentrations in steady state (long after the stress was first induced) that can be measured, all quantities involved were assumed to be constant. This assumption simplifies the system greatly and reduced the problem to these two equations:
 +
 
 +
[[File:ss1.png]]
 +
 
 +
[[File:ss2.png]]
 +
 
 +
Where C<sup>i</sup>=&alpha;<sub>M</sub><sup>i</sup>&alpha;<sub>p</sub><sup>i</sup>RNAp<sup>T</sup>/&gamma;<sub>M</sub><sup>i</sup>&gamma;<sub>p</sub><sup>i</sup>, RNAp<sup>T</sup> is the total number of RNA-polymerase molecules in the cell and K<sup>i</sup>=q<sup>i</sup>/k<sup>i</sup>.
 +
To check that the equations for steady state are correct a comparison was made between the numerical solution in Dizzy and the analytical solution for different levels of ppGpp. The comparison shown in the figure below shows that there is excellent agreement.
 +
 
 +
[[File:Comparison.png|730x440px|Comparison between numerical simulation and analytical solution in steady state for different values of ppGpp]]
 +
 
 +
Figure 2: ''Comparison between numerical simulation and analytical solution in steady state for different values of ppGpp''
=== Stochastic Differential equations ===
=== Stochastic Differential equations ===
-
In the cell, all movement of the molecules is random, this gives rise to the stochasticity observed in gene expression experiment. In this section we outline two methods where the randomness is accuared for; the Gillespie algorithm and the approximation  &#x3c4; - leap algorithm.
 
 +
In the cell, all movement of the molecules is random, this gives rise to the stochasticity observed in gene expression experiment. In this section we outline two methods where the randomness is accounted for; the Gillespie algorithm and the approximation  &#x3c4; - leap algorithm.
 +
When the number of participating molecules are low (which do happen in the cell), then stochasticity really matters. If there are many molecules, then the behavior of the reactions goes "smooth" and looks like an ODE. However when to number of reactants gets small, the number of reactions in a small time frame varies, this gives rise to the irregularity seen in the time series for the number of created products in the cell.   
 +
 +
====Gillespie ====
 +
 +
The Gillespie algorithm is a model that simulates the number of reactions and the time between the reactions exactly (under some assumptions). The most important assumptions is that the concentration of the reactants is distributed uniformly in the cell, the other important assumption is that the time between the reactions occur is markovian, that is it has no memory of how long the previous time step was.
 +
 +
An improved version of the original algorithm, the Gibson-Bruck algorithm, was used with Dizzy.
====Tau - leap Algorithm ====
====Tau - leap Algorithm ====
 +
Instead of calculating the time between each reaction, we can fix a time frame and estimate how many reactions occur in that time frame.
-
====Gilespie ====
+
It can be shown that setting the production and destruction rate from the ODE in as a rate parameter in an Poisson distribution, will give the same result. This is a nice way of reconfiguring the tau - leap such that it gives a clear and precise relation to the ODE models, and mathematical consistency.
-
=== Regression Models ===
+
This approach was used to create a framework for a stochastic simulation using the computer language C++, there where an attempt to make this in a fully functional open source program. However this attempt stranded due to lack of time.
 +
==Results==
 +
This section describes the findings from the theoretical model presented in the equations above.
-
&#8747;
+
As seen in figure 3 the level of LacI is reduced when ppGpp increases in the cell, which in turn leads to increased level of mCherry. The level of LacI reaches a saturation point, since there will be some production of LacI due to leakage, this sets a limit from above for mCherry. The stochastic realization fluctuates around the deterministic results, with the distributions shown in figure 4.
-
== Model Validation ==
+
[[File:time_evolution.png|730x440px|The number of molecules of the proteins for the stochastic and  the deterministic models are shown for different values of ppGpp]]
-
No data yet.
+
-
== References ==
+
Figure 3:''The number of molecules of the proteins for the stochastic and  the deterministic models are shown for different values of ppGpp''
 +
All the distributions are unimodal having only one peak, peaked around the steady state value from the deterministic model. The distributions for ppGpp=630 and ppGpp=1030 have heavy tails which indicates large deviations from the mean value, while the distributions for ppGpp=230 and ppGpp=1430 are more symmetric, resembling the normal distribution.
 +
 +
[[File:distributions.png|730x440px]]
 +
 +
Figure 4:''The distribution of mCherry for t=2000s to t=10000s for different values of ppGpp. The distributions was calculated from the realizations in figure 3''
 +
 +
As shown in figure 5 the variance increases linearly with increasing level of ppGpp, while the Fano factor (&sigma;<sup>2</sup>/&mu;) is almost constant around 1.5 .
 +
 +
 +
[[File:variance.png|730x440px]]
 +
 +
Figure 5: ''Variance and Fano factor for increasing values of ppGpp''
 +
 +
== Model Validation ==
 +
To see if the results agree with the theoretical model is vital in all parts of science. In this section we will give a short reasoning of why our models seems to fit the observed data. Here we will consider our model in two distinct parts, one deterministic model and one stochastic model, both of equally importance.
 +
Because of the limited amount of data currently available it is diffucult to draw conclusions. In the modeling the level of ppGpp was assumed known, however we do not know at present the relationship between a stress level and the number of ppGpp and the relationship between the number of mCherry molecules and the flourescence level is also unknown.
 +
However there are certain indications that the model predicts the behaviour of our system. For the deterministic model, we saw a significant increase in the concentration of mCherry as ppGpp was assumed present. This is consistent with the observation, described in figure 1 at [[Team:NTNU_Trondheim/Stress_Sensor#Stress sensor characterization|Stress sensor characterization]].
 +
Using the distributions from figure 4, we can compare these to the spread in the flowcytometry, and see a close similarity in form of shape. Assuming scale is irrelevant (i.e. that there is a linear relationship between number of mCherry molecules and fluorescence count) one can come to a conclusion that our model seems to describe the actual biological system.
{{:Team:NTNU_Trondheim/NTNU_footer}}
{{:Team:NTNU_Trondheim/NTNU_footer}}

Latest revision as of 08:28, 21 September 2011



Modeling

To describe and understand the biological reactions and processes as the bacteria turns red under stress, we developed multiple mathematical and statistical models. As the most basic model we used a system of ordinary differential equations (ODE). This is a fully deterministic model, describing the change in concentration for all molecules involved.

Taking into account that the number of molecules in the cell, tends to be low and their movement random, one would have to use a model that allow randomness. This will result in a system of stochastic differential equations (SDE), which describe the underlying model from an observed pattern (random motion) in the number of molecules over time, given a concentration of ppGpp. A realization of such a model can be found under in figure in section for Results. Investigation of those models is done using Monte Carlo simulation in matlab and Dizzy.

Model introduction

At the heart of the modeling lies biological consistency and data integration. The modeling will be focused on interpretation simplicity and data consistency. That is to develop models that can be easily interpreted by biologist and mathematicians, but also describe what is observed at the laboratory.

The two main ways to model biological systems are deterministic and stochastic. In this project we will attempt to approach the problems in both ways. Using a deterministic model, with fixed parameters, and a stochastic model to integrate data more dynamically.

Process description

As described in the introduction, ppGpp will repress the production of LacI and LacI represses the production of mCherry. These are the dominating processes. In short, when ppGpp are not present there will be little mCherry and when ppGpp are present the level of mCherry will be substantially higher.

In addition to these processes there are additional processes which might be of importance. First of all ppGpp affects the RNAP (RNA polymerase) and can therefore affect the production of mCherry as well as the production of LacI. As described in the introduction LacI is expected to be heavily downregulated by ppGpp. The production of mCherry can also in some cases lead to stress and therefore more ppGpp which in turn leads to more mCherry, in other words a positive feedback loop. These effects are assumed to be small, but might still affect the outcome.

ModelSetup.png

Figure 1:The processes involved in our system

The Models

Two basic models where constructed, which will be described below.


Systems of ODE

Models based on Ordinary Differential Equations (ODEs) are one of the most used methods of describing genetic circuits, while the different processes and reactions taking place in the cell are described by a set of coupled differential equations. This method might give both qualitative and quantitative information about the system and can therefore be very useful. It is however dependent on accurate kinetic reaction parameters and in some cases one also has to take into account the stochastic nature of genetic circuits. The equations are then solved either in Dizzy or Matlab.

Basic Model

Based on the model above the most important processes are, the symbols to the left of the equation denotes the kinetic parameter for the process:

Basic processes.png

The first process describes ppGpp attaching to the RNA polymerase, RNApA denotes the active part which is not repressed by ppGpp. RNApR is the number of repressed RNA polymerase molecules. This process can also be reversed. The transcription of mRNA is described in one step. D01 is the promoter determining the production of LacI mRNA and M1 is LacI mRNA. The LacI mRNA in turn leads to production of LacI transcription factor denoted as TF. Both mRNA and LacI is degraded as well. LacI will then inhibit mCherry:

Basic processes2.png

Basic processes3.png

Where D12 denotes promotors inhibited by LacI transcription factor. ξ is a measure of how strongly ppGpp affects mCherry directly, if 1 it is unaffected if 0 it is affected as strongly as LacI.

Steady State

Since it is difficult to find accurate parameters for all the processes involved and since it is often only the concentrations in steady state (long after the stress was first induced) that can be measured, all quantities involved were assumed to be constant. This assumption simplifies the system greatly and reduced the problem to these two equations:

Ss1.png

Ss2.png

Where CiMiαpiRNApTMiγpi, RNApT is the total number of RNA-polymerase molecules in the cell and Ki=qi/ki. To check that the equations for steady state are correct a comparison was made between the numerical solution in Dizzy and the analytical solution for different levels of ppGpp. The comparison shown in the figure below shows that there is excellent agreement.

Comparison between numerical simulation and analytical solution in steady state for different values of ppGpp

Figure 2: Comparison between numerical simulation and analytical solution in steady state for different values of ppGpp

Stochastic Differential equations

In the cell, all movement of the molecules is random, this gives rise to the stochasticity observed in gene expression experiment. In this section we outline two methods where the randomness is accounted for; the Gillespie algorithm and the approximation τ - leap algorithm.

When the number of participating molecules are low (which do happen in the cell), then stochasticity really matters. If there are many molecules, then the behavior of the reactions goes "smooth" and looks like an ODE. However when to number of reactants gets small, the number of reactions in a small time frame varies, this gives rise to the irregularity seen in the time series for the number of created products in the cell.

Gillespie

The Gillespie algorithm is a model that simulates the number of reactions and the time between the reactions exactly (under some assumptions). The most important assumptions is that the concentration of the reactants is distributed uniformly in the cell, the other important assumption is that the time between the reactions occur is markovian, that is it has no memory of how long the previous time step was.

An improved version of the original algorithm, the Gibson-Bruck algorithm, was used with Dizzy.

Tau - leap Algorithm

Instead of calculating the time between each reaction, we can fix a time frame and estimate how many reactions occur in that time frame.

It can be shown that setting the production and destruction rate from the ODE in as a rate parameter in an Poisson distribution, will give the same result. This is a nice way of reconfiguring the tau - leap such that it gives a clear and precise relation to the ODE models, and mathematical consistency.

This approach was used to create a framework for a stochastic simulation using the computer language C++, there where an attempt to make this in a fully functional open source program. However this attempt stranded due to lack of time.

Results

This section describes the findings from the theoretical model presented in the equations above.

As seen in figure 3 the level of LacI is reduced when ppGpp increases in the cell, which in turn leads to increased level of mCherry. The level of LacI reaches a saturation point, since there will be some production of LacI due to leakage, this sets a limit from above for mCherry. The stochastic realization fluctuates around the deterministic results, with the distributions shown in figure 4.

The number of molecules of the proteins for the stochastic and  the deterministic models are shown for different values of ppGpp

Figure 3:The number of molecules of the proteins for the stochastic and the deterministic models are shown for different values of ppGpp

All the distributions are unimodal having only one peak, peaked around the steady state value from the deterministic model. The distributions for ppGpp=630 and ppGpp=1030 have heavy tails which indicates large deviations from the mean value, while the distributions for ppGpp=230 and ppGpp=1430 are more symmetric, resembling the normal distribution.

Distributions.png

Figure 4:The distribution of mCherry for t=2000s to t=10000s for different values of ppGpp. The distributions was calculated from the realizations in figure 3

As shown in figure 5 the variance increases linearly with increasing level of ppGpp, while the Fano factor (σ2/μ) is almost constant around 1.5 .


Variance.png

Figure 5: Variance and Fano factor for increasing values of ppGpp

Model Validation

To see if the results agree with the theoretical model is vital in all parts of science. In this section we will give a short reasoning of why our models seems to fit the observed data. Here we will consider our model in two distinct parts, one deterministic model and one stochastic model, both of equally importance.

Because of the limited amount of data currently available it is diffucult to draw conclusions. In the modeling the level of ppGpp was assumed known, however we do not know at present the relationship between a stress level and the number of ppGpp and the relationship between the number of mCherry molecules and the flourescence level is also unknown.

However there are certain indications that the model predicts the behaviour of our system. For the deterministic model, we saw a significant increase in the concentration of mCherry as ppGpp was assumed present. This is consistent with the observation, described in figure 1 at Stress sensor characterization.

Using the distributions from figure 4, we can compare these to the spread in the flowcytometry, and see a close similarity in form of shape. Assuming scale is irrelevant (i.e. that there is a linear relationship between number of mCherry molecules and fluorescence count) one can come to a conclusion that our model seems to describe the actual biological system.