Team:Edinburgh/Genetic Instability
From 2011.igem.org
Line 47: | Line 47: | ||
===Genetic algorithm=== | ===Genetic algorithm=== | ||
- | This is a very powerful <span class="hardword" id="algorithm">algorithm</span>, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there | + | This is a very powerful <span class="hardword" id="algorithm">algorithm</span>, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there are many of them in a 'generation', and they 'cross-over' and 'mutate' their 'genes' at some randomised rate. The most 'fit' individuals get into the next generation. |
- | + | A genetic algorithm is evolution ''in silico!'' It's perfect for use in Biology! | |
- | We use this algorithm to | + | We can use this algorithm to generate DNA sequences which are as far from the original as possible. '''In this mode, the tool also avoids heavy use of rare codons'''. And, it can return more than one sequence of DNA, which are also far from each other. |
What follows are 200 bases of two synthetic versions of <partinfo>BBa_K265008</partinfo> which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often. These sequences were produced by the genetic algorithm. | What follows are 200 bases of two synthetic versions of <partinfo>BBa_K265008</partinfo> which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often. These sequences were produced by the genetic algorithm. |
Revision as of 20:07, 20 July 2011
OK, so apparently there exists this thing called genetic instability, and we need to counter it...
Contents |
What is genetic instability?
When a lot of similar DNA sequences are introduced into a cell, this can potentially lead to those pieces of DNA undergoing recombination and thus rearranging the DNA. This will not be a problem in our lab strain of E. coli, JM109, as it lacks the RecA recombinase enzyme which causes this to happen. However, industry requires hardy strains of E. coli, which must possess recombinases to deal with their high stress working environment.
But there is a solution!
If we only had many different (as far apart as possible) DNA sequences coding for the same protein, the world would be saved! According to [http://www.pnas.org/content/82/14/4768.full.pdf Watt et al (1985)] even a small number of mismatches drastically reduce recombination.
Enter Team Synergy's genetic stability tool! Using DNA's natural property of redundancy, it will find a number of different DNA sequences for you! And they will code for the same protein! Superb!
So, how does it work?
Possible modes
Random generation
This is a very crude mode, in which a random sequence of codons coding for analogous amino acids is generated, e.g. if you had:
atgaaaaagtctttagtcctcaaagcctctgtagccgttgctaccctcgttccgatgctgtctttcgct...
(the leader sequence for pVIII),
the program would translate it to amino acids sequence, i.e.:
MKKSLVLKASVAVATLVPMLSFA...
and then find a random codon for each amino acid, put them together, and spit out the result.
If this sounds too simple to produce really good results... you're right!
Best codon
In this mode, the tool would choose the best codon per every amino acid. What is the 'best codon'? It's the codon which has the most base pairs different from the original one.
For example, Leucine can be coded by TTA, TTG, CTT, CTC, CTA, or CTG. If the original sequence uses TTA, then the best codon method would choose CTT, CTC or CTG. There often are plenty of 'best' codons, so this method would be improved if it looked at the wider context...
Genetic algorithm
This is a very powerful algorithm, inspired by Computer Scientists observing Biology. It represents a problem using a string of characters - i.e. the 'DNA', then solutions are 'individuals', there are many of them in a 'generation', and they 'cross-over' and 'mutate' their 'genes' at some randomised rate. The most 'fit' individuals get into the next generation.
A genetic algorithm is evolution in silico! It's perfect for use in Biology!
We can use this algorithm to generate DNA sequences which are as far from the original as possible. In this mode, the tool also avoids heavy use of rare codons. And, it can return more than one sequence of DNA, which are also far from each other.
What follows are 200 bases of two synthetic versions of <partinfo>BBa_K265008</partinfo> which are both dissimilar to it, as well as being dissimilar to each other. They also avoid using rare codons very often. These sequences were produced by the genetic algorithm.
Alpha 1 atgaccctggacaaagcgctggtgctgcgtacctgcgccaacaacatggc 50 |||||||||||.|||||.|||||.||||||||.|||||.||||||||||| Beta 1 atgaccctggataaagctctggtcctgcgtacgtgcgcgaacaacatggc 50 Alpha 51 agatcactgcggtctgatttggccggcctctggcaccgtagagtcccgtt 100 .||.||||||||.|||||.||||||||....|||||.||.||.||.|||| Beta 51 tgaccactgcggcctgatctggccggcaagcggcactgttgaatctcgtt 100 Alpha 101 actggcagagcactcgtcgtcatgaaaacggtctggttggcctgctgtgg 150 |||||||....||.||||||||.|||||||||||.||.||.|||||.||| Beta 101 actggcaatcgacccgtcgtcacgaaaacggtcttgtgggtctgctctgg 150 Alpha 151 ggcgcgggtacttcggccttcttatctgtccatgctgacgcgcgttggat 200 ||.||.||.||....||.|||.|.||.||.|||||.||.|||||.||||| Beta 151 ggtgcaggcaccagcgcattcctgtcggttcatgcggatgcgcgctggat 200
References
- Watt VM, Ingles CJ, Urdea MS, Rutter WJ (1985) [http://www.pnas.org/content/82/14/4768.full.pdf Homology requirements for recombination in Escherichia coli]. Proc. Natl. Acad. Sci. USA 82: 4768-4772.