Team:UNAM-Genomics Mexico/Project/deltaggibbs
From 2011.igem.org
ΔG for RBS
The adaptiveness of each one of the 64 codons in the desired organism is calculated. A genetic code table is generated, marking codons “allowed” when their adaptiveness is higher than a threshold level, or as prohibited when it is not. A different table is generated for each organism and each threshold.
A perl subroutine receives as input the amino acid sequence of the gene to optimize. It splits the first seven aminoacids and, using the genetic code table, generates all the nucleotide sequences that codify this seven amino acids with adaptiveness above the threshold. This sequences are printed into a temporary file.
A second perl subroutine receives as input a length and a number of sequences to generate. It calls RSATools, which generates the amount of sequences required of the given length. These sequences will be randomized, respecting an Escherichia coli nucleotide model, with controlled parameters such as GC content and complexity. The subroutine processes the RSAT output and prints the sequences into a second temporary file.
A third subroutine combines all the coding sequences with all the random sequences, adding an RBS sequence (provided by the user) at the beginning of each string. It prints the composed sequences into a third file and erases the second and first files.
A fourth subroutine calls Hybrid-ss to calculate the ∆G inherent to each sequence. The subroutine receives a temperature and salt concentrations as parameters for hybrid. Hybrid-ss receives a DNA or RNA string as an input, and calculates the folding free energy of the transcribed sequence for the given temperature and salt concentrations. More positive values translate into less folding, a property desired for an mRNA to be translated efficiently. Hybrid-ss generates a file with sequence and ∆G values.
A fifth subroutine processes Hybrid-ss’ file into tables ready to be analyzed with R. All five subroutines are embedded in the same perl script.
Finally, an R script analyzes the tables in searh of sequences with the least folding. These sequences will have been optimized to avoid, to a certain degree, folding after their transcription.