Latest revision as of 23:30, 28 October 2011

Software - Joint codon optimisation algorithm

We wanted the option of expressing the genes responsible for auxin production in both B. subtilis and E. coli. To achieve this, we decided to joint codon optimise the IaaM and IaaH coding sequences. We could not find any software to do this so wrote our own.

1. Background

The genetic code is redundant which means that multiple codons can encode the same amino acid. Synonymous codons are circumstantially decoded at different speeds. Codon usage also varies between different species. This phenomenon means that sequence optimisation can be used to tune protein expression levels. It is tempting to think that one could codon optimise a sequence by selectively using an organism’s preferred codons. This is commonly referred to as the "one amino acid-one codon" method. Unfortunately it does not work. Recent optimisation studies^[1] have highlighted the importance of maintaining a diverse codon population in a given coding sequence. This said, the inclusion of ‘rare codons’ has also been shown to dramatically reduce protein expression E. coli.^[2]

2. Solution

In our approach to codon optimisation, we attempted to maintain codon diversity while simultaneously limiting rare codon inclusion. This was achieved by weighting codon selection using bias tables obtained from the Codon Usage Database^[3]. Joint optimisation was facilitated by combining the bias tables of E. coli and B. subtilis. Following the generation of a seed-sequence, the stochastic pruning of rare codons was used to iteratively optimise the sequence.

3. In-silico testing

To test our codon optimisation algorithm, we ran the protein coding sequence for Dendra2 (BBa_K515007) through our software. The DNA sequences generated in the first 100 cycles were then fed into Genscript’s Codon Adaptation Index (CAI) analyser . This online tool measures the suitability of a sequence for expression in E. coli. Genscript’s own codon optimisation claims to be able to generate sequences with a CAI > 0.8. We were able to match this for both single and joint optimisation.

Figure 1: A Codon Adaptation Index (CAI) of 1.0 is deemed ideal. A CAI of >0.8 is rated as good for expression. Sequences for the 'Control' series (green line) were generated by randomly selecting codons. Sequences optimised for ''E. coli'' (blue line) and joint optimised for ''E. coli'' and ''B. subtilis'' (red line) were generated using our optimisation algorithm. (Data generated by Imperial College London iGEM team 2011).

The data shown in Figure 1 indicates that our algorithm can generate highly optimised protein sequences for expression in E. coli. In addition, joint optimisation does not reduce the CAI (for E. coli) below the threshold of 0.8.

4. Future work

Recent studies^[4] have suggested that rather than favouring highly used codons, it is preferable to select a subset for which the accompanying tRNAs are the most frequently charged during amino acid starvation. Once these codons have been tabulated for both E. coli and B. subtilis this data could be incorporated into our program.

5. R code

6. References

[1] Menzella HG (2011) Comparison of two codon optimization strategies to enhance recombinant protein production in Escherichia coli. Microb Cell Fact 10: 15.
[2] Kane JF (1995) Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Current Opinion in Biotechnology 6(5): 494-500, ISSN 0958-1669, 10.1016/0958-1669(95)80082-4. (http://www.sciencedirect.com/science/article/pii/0958166995800824)
[3] Nakamura Y, Gojobori T and Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28: 292.
[4] Welch M, Govindarajan S, Ness JE, Villalobos A, Gurney A, et al. (2009) Design parameters to control synthetic gene expression in Escherichia coli. PLoS ONE 4(9): e7002. doi:10.1371/journal.pone.0007002

@@ Line 4: / Line 4: @@
 <hr style="color:#BDCBBD; height:3px;" />
-<h1>Software - Joint Codon Optimisation Algorithm</h1>
+<h1>Software - Joint codon optimisation algorithm</h1>
-<p><b>We wanted the flexibility to express the genes responsible for auxin production in both <i>B.subtilis</i> and <i>E.coli</i>. To achieve this, we decided to joint codon optimise the IaaM and IaaH coding sequences. Since, we could not find any software for this, we wrote our own.</b></p>
+<p><b>We wanted the option of expressing the genes responsible for auxin production in both <i>B. subtilis</i> and <i>E. coli</i>. To achieve this, we decided to joint codon optimise the <i>IaaM</i> and <i>IaaH</i> coding sequences. We could not find any software to do this so wrote our own.</b></p>
+<br/>
 <hr style="color:#BDCBBD; height:3px;" />
-<h2>Background:</h2>
+<h2>1. Background</h2>
-<p>The genetic code is redundant which means that multiple codons can encode the same amino acid. Synonymous codons are circumstantially decoded by the cellular machinery at different speeds. This phenomenon means that it is possible to increase protein yields by optimising codon usage.
+<p>The genetic code is redundant which means that multiple codons can encode the same amino acid. Synonymous codons are circumstantially decoded at different speeds. Codon usage also varies between different species. This phenomenon means that sequence optimisation can be used to tune protein expression levels.
 It is tempting to think that one could codon optimise a sequence by selectively using an organism’s preferred codons. This is commonly referred to as the "one amino acid-one codon" method. Unfortunately it does not work.
-Recent optimisation studies have highlighted the importance of maintaining a diverse codon population in a given coding sequence. This said, the inclusion of ‘rare codons’ has also been shown to dramatically reduce protein expression E.coli.</p>
+Recent optimisation studies<sup>[1]</sup> have highlighted the importance of maintaining a diverse codon population in a given coding sequence. This said, the inclusion of ‘rare codons’ has also been shown to dramatically reduce protein expression <i>E. coli.</i><sup>[2]</sup></p>
-<h2>Solution:</h2>
+<h2>2. Solution</h2>
-<p>In our approach to codon optimisation, we attempted to maintain codon diversity while simultaneously limiting rare codon inclusion. This was achieved by randomly selecting codons weighted by codon usage bias and then pruning the sequence of rare codons. The codon bias table used for this process was generated by combining those of E.coli and B.subtilis. Rare codon pruning was achieved by re-sampling synonymous codons to maintain sequence diversity. The script used for codon optimisation was written in R and can be downloaded below.</p>
+<p>In our approach to codon optimisation, we attempted to maintain codon diversity while simultaneously limiting rare codon inclusion. This was achieved by weighting codon selection using bias tables obtained from the <i>Codon Usage Database</i><sup>[3]</sup>. Joint optimisation was facilitated by combining the bias tables of <i>E. coli</i> and <i>B. subtilis</i>. Following the generation of a <i>seed-sequence</i>, the stochastic pruning of rare codons was used to iteratively optimise the sequence.</p>
-<h2>In-silico Testing:</h2>
+<h2>3. In-silico testing</h2>
-<p>To test our codon optimisation software, we ran the protein Dendra2 (BBa_K515007) through our software. The resultant DNA sequences were then fed into Genscript’s Codon Adaptation Index (CAI) analyser. This online tool measures the suitability of a sequence for expression in E.coli. Genscript’s own codon optimisation claims to be able to generate sequences with a CAI > 0.8. We were able to match this.</p>
+<p>To test our codon optimisation algorithm, we ran the protein coding sequence for Dendra2 (BBa_K515007) through our software. The DNA sequences generated in the first 100 cycles were then fed into <a href="http://www.genscript.com/cgi-bin/tools/rare_codon_analysis"> <i>Genscript’s Codon Adaptation Index (CAI) analyser</i> </a>. This online tool measures the suitability of a sequence for expression in <i>E. coli</i>. Genscript’s own codon optimisation claims to be able to generate sequences with a CAI > 0.8. We were able to match this for both single and joint optimisation.</p>
 <div class="imgbox" style="width:900px;float:middle;margin-top:20px;"/>
 <img class="border" src="https://static.igem.org/mediawiki/2011/8/81/ICL_2011_CodonOptimisation.png" width="870px"/>
-<p><i>Figure 1:  (Data generated by Imperial College iGEM team 2011.)</i></p>
+<p><i>Figure 1: A Codon Adaptation Index (CAI) of 1.0 is deemed ideal. A CAI of >0.8 is rated as good for expression. Sequences for the 'Control' series (green line) were generated by randomly selecting codons. Sequences optimised for '<i>'E. coli'</i>' (blue line) and joint optimised for '<i>'E. coli'</i>' and '<i>'B. subtilis'</i>' (red line) were generated using our optimisation algorithm. (Data generated by Imperial College London iGEM team 2011).</i></p>
 </div>
+<p>The data shown in <i>Figure 1</i> indicates that our algorithm can generate highly optimised protein sequences for expression in <i>E. coli</i>. In addition, joint optimisation does not reduce the CAI (for <i>E. coli</i>) below the threshold of 0.8.</p>
-<h2>Future Work:</h2>
+<h2>4. Future work</h2>
-<p>Recent work has suggested that rather than using codon frequency tables, it is better to use codons that are read by a subset of tRNAs that the most frequently charged during amino acid starvation.
+<p>Recent studies<sup>[4]</sup> have suggested that rather than favouring highly used codons, it is preferable to  select a subset for which the accompanying tRNAs are the most frequently charged during amino acid starvation. Once these codons have been tabulated for both <i>E. coli</i> and <i>B. subtilis</i> this data could be incorporated into our program.</p>
-http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007002
-Once enough data is available for both E.coli and B.subtilis this data could be incorporated into the program.</p>
+<h2>5. R code</h2>
+<div class="thelanguage">
+<p><a href="https://static.igem.org/mediawiki/2011/f/f5/CodonOptimisationScript.zip"><img src="https://static.igem.org/mediawiki/2011/8/8c/ICL_DownloadIcon.png" width="180px" /></a></p>
+</div>
+<h2>6. References</h2>
+<p>
+[1] Menzella HG (2011) Comparison of two codon optimization strategies to enhance recombinant protein production in <i>Escherichia coli</i>. <i>Microb Cell Fact</i> <b>10:</b> 15.
+<br>
+[2] Kane JF (1995) Effects of rare codon clusters on high-level expression of heterologous proteins in <i>Escherichia coli</i>. <i>Current Opinion in Biotechnology</i> <b>6(5):</b> 494-500, ISSN 0958-1669, 10.1016/0958-1669(95)80082-4.
+(http://www.sciencedirect.com/science/article/pii/0958166995800824)
+<br>
+[3] Nakamura Y, Gojobori T and Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. <i>Nucleic Acids Res</i> <b>28:</b> 292.
+<br>
+[4] Welch M, Govindarajan S, Ness JE, Villalobos A, Gurney A, et al. (2009) Design parameters to control synthetic gene expression in <i>Escherichia coli</i>. <i>PLoS ONE</i> <b>4(9):</b> e7002. doi:10.1371/journal.pone.0007002
+<br>
-<h2>References:</h2>
+</p>
-http://www.microbialcellfactories.com/content/10/1/15/abstract
 </body>
 </html>

Team:Imperial College London/Software

From 2011.igem.org