Team:ULB-Brussels/modeling/30

From 2011.igem.org

Modelling : Transciptional interference

In this section, we will study the transcriptional interference between the 2 functional units.
Schematic view of the different genes and the motion of RNA-polymerase molecules. The RNA-polymerase molecule on the right will meet the 4th molecule before finishing the transcription of flp. This represent a very simple example of transcriptional interference. $N_0$: size (nt) of the gam, bet and exo genes; $N_1-N_0$: size (nt) of the flp gene

As explained above, the pINDEL plasmid is constructed such that in the presence of arabinose and at $30\circ$C, the transcription from the pBAD promoter inhibits the transcription of the flp gene by transcriptional interference.

We have modeled the transcriptional interference by collisions (for review, see Shearwin et al., 2003, trends in genetics). The purpose is to study this transcriptional interference by simulating the movements of the RNA-polymerases along the gam, bet and exo genes and the flp gene, and therefore estimate the interference efficiency.

The RNA-polymerase molecules bind to the promoters located in 0 (for pBAD) and $N_1$ (for the pR promoter). The gene encoding FLP will be transcribed only if the RNA-polymerase molecules, which have initiated transcription at $N_1$ do not meet RNA-polymerase molecules between $N_1$ and $N_0$, and still interact with DNA.

The way the program works:

We enter:
  • The size of the two transcriptional units
  • The frequency in which RNA-polymerases bind to the 2 promoters (i.e the strength of the promoter): $T_{pBAD}^{-1}$ and $T_{pR}^{-1}$
  • The elongation rate of the RNA-polymerase
  • The number of event to be simulated ($N_pR$) (i.e. the number of RNA-polymerases which will bind on the pR promoter)

Then, it will calculate the ratio between the number of RNA-polymerases, which reach $N_0$ starting from pR ($N_1$) and the number of RNA-polymerases that have initiated transcription at pR ($N_1$).

We also make the following approximations:

  • The elongation rate of RNA-polymerase is the same and constant for the 2 transcriptional units
  • The probability that RNA-polymerases prematurely terminate transcription is memoryless.
  • When two RNA-polymerases come in collision with each other, they prematurely terminate transcription and will not interfere with the next round of transcription

$N_pR$ represents the number of RNA-polymerase molecules that bind to the pR promoter in the time interval [$0$, $N_pR \cdot T_pR$] and $N_pBAD = N_pR .T_pR /T_pBAD$ those that bind to the pBAD promoter. For both promoters, the program will randomly generate times corresponding to the RNA-polymerases binding and transcription initiation in the time interval [$0$, $N_pR \cdot T_pR$].

In a first step, if we neglect the risks of premature termination and interferences, and knowing the RNA-polymerase elongation rate, we can predict their positions along the pINDEL DNA as a function of time.

In a second step, we will take into account the risk of premature termination.

To this end, let us make some calculations taking into account the above approximations

We know that the chance that the RNA-polymerase molecules premature terminate the transcription does not change with the amount of nucleotides already read. If we define the random variable $N$ as the number of nucleotides read by the polymerase, we can say that $N$ follows an exponential distribution ($N \sim $ Exp$(- \lambda)$). Indeed, the probability of reading, for example, $20$ more nucleotides, knowing that the polymerase already read a certain number of them doesn't depend of that number. We can then say that N is memoryless, and thus follows an exponential distribution.

We still have to figure out what the parameter of that exponential distribution is. We estimated that their is only a small chance the polymerase transcribes the whole plasmid at once. By taking $\lambda = 3,5.10^{-4}$, we have that the chance of reading and transcribing all the nucleotides of the plasmid is close enough to zero. Now that we have the distribution of $N$: \[F(n) = P(N \leq n) = \left\{ \begin{array}{l l} 1-e^{-0,00035 n} & \quad \mbox{if $n \geq 0$}\\ 0 & \quad \mbox{if $n \leq 0$}\\ \end{array} \right. \] We can generate random numbers following this distribution to simulate N, by applying $F^{-1}$ on a $[0,1]$-uniform random variable. Indeed, the new random variable will then follow the law which F is the distribution function. Let us show this. Let $X$ be a $[0,1]$-uniform random variable. We have that \[P(X \leq x) = \left\{ \begin{array}{l l l} 0 & \quad \mbox{if $x \leq 0$}\\ x & \quad \mbox{if $0 \leq x \leq 1$}\\ 1 & \quad \mbox{if $x \geq 1$}\\ \end{array} \right. \] Now, let $F$ be a given distribution function. Define the random variable $Y=F^{-1}(X)$. We have : \begin{equation} P(Y \leq y) = P(F^{-1}(X) \leq y) = P(X \leq F(y)) = F(y) \end{equation} Thus Y follows the law which F is the distribution function, like announced.

In a last step, we take into account the supposition that when 2 RNA-polymerases come in collision with each other, they prematurely terminate transcription and will not interfere with the next round of transcription. In order to calculate the efficiency of transcriptional interference, we simply have to count the number of RNA-polymerases that bind to $N_1$ and reach $N_0$, and compare this number to the number of RNA-polymerases that binds to $N_1$.

As we know the size of the 2 transcriptional units :

  • 2067 nt for gam+bet+exo
  • 1272 nt for flp
We also have an idea of the elongation rate: 24-79 nt/s. We can also estimate the frequencies in which RNA-polymerases bind to the promoters as we know the time it takes to a protein to be produced ($1/240 \mbox{s}^{-1}$ for pR and $1/40 \mbox{s}^{-1}$ for pBAD).
The simulation finally shows that due to transcriptional interferences only 63% of the flp is transcribed. This seems to be coherent considering the fact that under $30\circ$C flp is already produced to 10% of its maximal rate.

iGEM ULB Brussels Team - Contact us