Team:Edinburgh/Genetic Instability

From 2011.igem.org

(Difference between revisions)
 
(31 intermediate revisions not shown)
Line 2: Line 2:
<html><script type="text/javascript" >$(document).ready(function() {
<html><script type="text/javascript" >$(document).ready(function() {
getMenus('model', 'model_genetic_instability');
getMenus('model', 'model_genetic_instability');
-
alignColumns();
 
}); </script></html>
}); </script></html>
<div class="main_body">
<div class="main_body">
-
OK, so apparently there exists this thing called '''genetic instability''', and we need to counter it...
+
 
 +
<p class="h1">Genetic Instability Tool</p>
 +
 
 +
Our project involves certain sequences of DNA being present in several copies on a <span class="hardword" id="plasmid">plasmid</span>. For example, to display three different enzymes on a cell via <span class="hardword" id="inp">Ice Nucleation Protein</span> (INP) requires three copies of the INP gene in the plasmid.
 +
 
 +
But there exists this thing called "genetic instability", and we wanted to know whether this made the whole project infeasible. (As it turns out, it doesn't.)
==What is genetic instability?==
==What is genetic instability?==
-
When a lot of similar DNA sequences are introduced into a cell, this can potentially lead to those pieces of DNA undergoing <span class="hardword" id="recombination">recombination</span> and thus rearranging the DNA. This will not be a problem in our lab strain of ''E. coli'', JM109, as it lacks the recombinase enzyme which causes this to happen. However, industry requires hardy strains of ''E. coli'', which must possess recombinases to deal with their high stress working environment.
+
When a lot of similar DNA sequences are introduced into a cell, this can potentially lead to those pieces of DNA undergoing <span class="hardword" id="recombination">recombination</span> and thus rearranging the DNA. This will not be a problem in our lab strain of ''E. coli'', JM109, as it lacks the RecA recombinase enzyme which causes this to happen. However, industry requires hardy strains of ''E. coli'', which must possess recombinases to deal with their high-stress working environment.
-
'''But there is a solution!'''
+
So it initially seems as if the systems we are investigating could never be used in industry. '''But in fact there is a solution!'''
-
If we only had many different (as far apart as possible) DNA sequences coding for the same protein, the world would be saved!
+
If only we had several different (as far apart as possible) DNA sequences coding for the same protein, the project would be saved! According to [http://www.pnas.org/content/82/14/4768.full.pdf Watt ''et al'' (1985)] even a small number of mismatches drastically reduce recombination frequency.
-
Enter Team Synergy's '''genetic stability tool'''! It will do exactly that - using DNA's natural property of redundancy, it will find a number of sequences of DNA for you! And they will code for the same protein! Superb!
+
Enter Edinburgh's '''Genetic Stability Tool!''' Using DNA's natural property of redundancy, it will find a number of different DNA sequences for you! And they will code for the same protein! Superb!
So, how does it work?
So, how does it work?
Line 37: Line 41:
and then find a random codon for each amino acid, put them together, and spit out the result.
and then find a random codon for each amino acid, put them together, and spit out the result.
-
This sound too easy to be right... that's right!
+
If this sounds too simple to produce really good results... you're right!
===Best codon===
===Best codon===
-
In this mode, the tool would choose the best codon per every amino acid. What is 'best codon'? It's the codon which has the most base pairs different from the original one.
+
In this mode, the tool would choose the best codon per every amino acid. What is the 'best codon'? It's the codon which has the most base pairs different from the original one.
-
For example, Leucine can be coded by UUA, UUG, CUU, CUC, CUA, or CUG. If the original sequence uses UUA, then the best codon method would choose CUU, CUC or CUG. Yeaaaah... . There often are plenty of 'best' codons, so this method would be improved if it looked at the wider context...
+
For example, Leucine can be coded by TTA, TTG, CTT, CTC, CTA, or CTG. If the original sequence uses TTA, then the best codon method would choose CTT, CTC or CTG. There often are plenty of 'best' codons, so this method would be improved if it looked at the wider context...
-
===Genetic Algorithm===
+
===Genetic algorithm===
-
This is a very powerful <span class="hardword" id="algorithm">algorithm</span>, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there is many of them in a 'generation', and they 'cross-over' and 'mutate' their 'genes' at some randomised rate. The most 'fit' individuals get into the next generation.
+
This is a very powerful <span class="hardword" id="algorithm">algorithm</span>, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there are many of them in a 'generation', and they 'cross-over' and 'mutate' their 'genes' at some randomised rate. The most 'fit' individuals get into the next generation.
-
That means it is perfect for applying it in Biology!
+
A genetic algorithm is evolution ''in silico!'' It's perfect for use in Biology!
-
We use this algorithm to give you the DNA sequences which are as far apart from the original as possible. In this mode, the tool also chooses codons which are well-expressed in ''E.coli''. And, it can return more than one sequence of DNA.
+
We can use this algorithm to generate DNA sequences which are as far from the original as possible. '''In this mode, the tool also avoids heavy use of rare codons'''. And, it can return more than one sequence of DNA, which are also far from each other.
-
What follows are 200 bases of two synthetic versions of <partinfo>BBa_K265008</partinfo> which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often.
+
What follows are 200 bases of two synthetic versions of [http://partsregistry.org/Part:BBa_K265008 BBa_K265008] which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often. These sequences were produced by the genetic algorithm.
<pre style="background-color: rgba(255, 255, 255, 1.0);">
<pre style="background-color: rgba(255, 255, 255, 1.0);">
Line 72: Line 76:
Beta  151  ggtgcaggcaccagcgcattcctgtcggttcatgcggatgcgcgctggat  200
Beta  151  ggtgcaggcaccagcgcattcctgtcggttcatgcggatgcgcgctggat  200
</pre>
</pre>
 +
 +
==Conclusion==
 +
 +
Using this tool we have shown that systems requiring multiple copies of a coding sequence can still be suitable for industrial use; recombination can be avoided by synthesising multiple different DNA sequences that code for the same protein.
 +
 +
==Download==
 +
 +
You can download the all-new all-shiny tool here:
 +
<html><center><a class="nounderline" href="https://2011.igem.org/File:Edinburgh_gen_stab_0.1_Alpha_Americano.zip"><img src="https://static.igem.org/mediawiki/2011/3/31/Python-logo.png" alt="Python logo" class="centre nobg" /><br>
 +
Team Edinburgh Genetic Stability Tool, v.0.1 Alpha Americano</a></center></html>
 +
 +
 +
To use, simply unpack, and run the gen_stab.py in python. Any questions?
 +
* Drop us a line on [http://www.twitter.com/iGEMEdinburgh Twitter], or
 +
* E-mail: L (dot) Kopec (at) sms (dot) ed (dot) ac (dot) uk
 +
 +
==References==
 +
<!-- unused... * Hénaut A, Danchin A (1996) Analysis and predictions from ''Escherichia coli'' sequences, or ''E. coli'' in silico. ''Escherichia coli and Salmonella, Cellular and Molecular Biology''. ASM Press, Washington, DC, '''Vol. 2''', 2047-2066. -->
 +
* Watt VM, Ingles CJ, Urdea MS, Rutter WJ (1985) [http://www.pnas.org/content/82/14/4768.full.pdf Homology requirements for recombination in ''Escherichia coli'']. ''Proc. Natl. Acad. Sci. USA'' '''82''': 4768-4772.
</div> <!-- /main_body-->
</div> <!-- /main_body-->
<html></div> <!-- /mids --></html>
<html></div> <!-- /mids --></html>

Latest revision as of 18:00, 21 September 2011

Genetic Instability Tool

Our project involves certain sequences of DNA being present in several copies on a plasmid. For example, to display three different enzymes on a cell via Ice Nucleation Protein (INP) requires three copies of the INP gene in the plasmid.

But there exists this thing called "genetic instability", and we wanted to know whether this made the whole project infeasible. (As it turns out, it doesn't.)

Contents

What is genetic instability?

When a lot of similar DNA sequences are introduced into a cell, this can potentially lead to those pieces of DNA undergoing recombination and thus rearranging the DNA. This will not be a problem in our lab strain of E. coli, JM109, as it lacks the RecA recombinase enzyme which causes this to happen. However, industry requires hardy strains of E. coli, which must possess recombinases to deal with their high-stress working environment.

So it initially seems as if the systems we are investigating could never be used in industry. But in fact there is a solution!

If only we had several different (as far apart as possible) DNA sequences coding for the same protein, the project would be saved! According to [http://www.pnas.org/content/82/14/4768.full.pdf Watt et al (1985)] even a small number of mismatches drastically reduce recombination frequency.

Enter Edinburgh's Genetic Stability Tool! Using DNA's natural property of redundancy, it will find a number of different DNA sequences for you! And they will code for the same protein! Superb!

So, how does it work?

Possible modes

Random generation

This is a very crude mode, in which a random sequence of codons coding for analogous amino acids is generated, e.g. if you had:

atgaaaaagtctttagtcctcaaagcctctgtagccgttgctaccctcgttccgatgctgtctttcgct...
(the leader sequence for pVIII),

the program would translate it to amino acids sequence, i.e.:

MKKSLVLKASVAVATLVPMLSFA...

and then find a random codon for each amino acid, put them together, and spit out the result.

If this sounds too simple to produce really good results... you're right!

Best codon

In this mode, the tool would choose the best codon per every amino acid. What is the 'best codon'? It's the codon which has the most base pairs different from the original one.

For example, Leucine can be coded by TTA, TTG, CTT, CTC, CTA, or CTG. If the original sequence uses TTA, then the best codon method would choose CTT, CTC or CTG. There often are plenty of 'best' codons, so this method would be improved if it looked at the wider context...

Genetic algorithm

This is a very powerful algorithm, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there are many of them in a 'generation', and they 'cross-over' and 'mutate' their 'genes' at some randomised rate. The most 'fit' individuals get into the next generation.

A genetic algorithm is evolution in silico! It's perfect for use in Biology!

We can use this algorithm to generate DNA sequences which are as far from the original as possible. In this mode, the tool also avoids heavy use of rare codons. And, it can return more than one sequence of DNA, which are also far from each other.

What follows are 200 bases of two synthetic versions of [http://partsregistry.org/Part:BBa_K265008 BBa_K265008] which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often. These sequences were produced by the genetic algorithm.

Alpha    1  atgaccctggacaaagcgctggtgctgcgtacctgcgccaacaacatggc   50
            |||||||||||.|||||.|||||.||||||||.|||||.|||||||||||
Beta     1  atgaccctggataaagctctggtcctgcgtacgtgcgcgaacaacatggc   50

Alpha   51  agatcactgcggtctgatttggccggcctctggcaccgtagagtcccgtt  100
            .||.||||||||.|||||.||||||||....|||||.||.||.||.||||
Beta    51  tgaccactgcggcctgatctggccggcaagcggcactgttgaatctcgtt  100

Alpha  101  actggcagagcactcgtcgtcatgaaaacggtctggttggcctgctgtgg  150
            |||||||....||.||||||||.|||||||||||.||.||.|||||.|||
Beta   101  actggcaatcgacccgtcgtcacgaaaacggtcttgtgggtctgctctgg  150

Alpha  151  ggcgcgggtacttcggccttcttatctgtccatgctgacgcgcgttggat  200
            ||.||.||.||....||.|||.|.||.||.|||||.||.|||||.|||||
Beta   151  ggtgcaggcaccagcgcattcctgtcggttcatgcggatgcgcgctggat  200

Conclusion

Using this tool we have shown that systems requiring multiple copies of a coding sequence can still be suitable for industrial use; recombination can be avoided by synthesising multiple different DNA sequences that code for the same protein.

Download

You can download the all-new all-shiny tool here:

Python logo
Team Edinburgh Genetic Stability Tool, v.0.1 Alpha Americano


To use, simply unpack, and run the gen_stab.py in python. Any questions?

  • Drop us a line on [http://www.twitter.com/iGEMEdinburgh Twitter], or
  • E-mail: L (dot) Kopec (at) sms (dot) ed (dot) ac (dot) uk

References

  • Watt VM, Ingles CJ, Urdea MS, Rutter WJ (1985) [http://www.pnas.org/content/82/14/4768.full.pdf Homology requirements for recombination in Escherichia coli]. Proc. Natl. Acad. Sci. USA 82: 4768-4772.