Team:Calgary/Notebook/Calendar/Binf
From 2011.igem.org
Bioinformatic Survey
Project Participants
Stephen Dixon, Patrick Wu
Author: Stephen
Project Background
Recently, David Lloyd, a TA, came across an interesting study which analyzed the capacity of bacteria to degrade naphthenic acids. According to the study, both Pseudomonas putida and Pseudomonas fluorescens are capable of degrading small amounts of naphthenic acids, but when put in co-cultures with each other, their capacity increases to 95% elimination. What's more is that this degradation was effective across a broad spectrum of naphthenic acids, including those with one, two, and even three rings. The inference we drew from this effect is that there are some unique genes within each bacteria that, when allowed to interact with each other, are responsible for the degradation of naphthenic acids. In a sense, one bacteria's garbage is another bacteria's treasure. This project's hypothesis is that it is possible to narrow down the candidates for this pathway by using a bioinformatic survey.
Original Strategy for the Bioinformatic Survey
Author: Stephen
The goal of the bioinformatic survey is to provide leads to the experimental side (or wet lab) of the project. Two assumptions were made at the beginning of the survey. The first is that the two genomes are homologous enough to eliminate a substantial portion of each genome from consideration. The other assumption is that the gene of interest is located within the non-homologous regions of the either genome. If both assumptions are correct, then it should be possible to create a short list of candidate genes involved in the degradation pathway, for investigation in the wet lab. Knowing what genes are involved in the degradation means that wet lab can simply look upstream for a naphthenic acid promoter.
June 27- July 1, 2011
Author: Stephen
My initial thinking is that both genomes consist of two parts: parts that have a homology to the other genome, and parts that don't. Eliminating the parts that do by definition would immediately reveal the parts that aren't. Since homology tends to be between similar sequences rather than exact matches, a statistical approach could be used that determines homology based on the significance of the finding.
Patrick Wu, my colleague, started off by looking for software that we could reuse for our application. DNA manipulation is sufficicently complicated enough that it makes no sense to reinvent the wheel; after some digging around, he found an open-source DNA alignment tool called MUMmer. MUMmer uses a suffix tree (a type of data structure) that works in O(n) time to rapidly align whole genomes; in the process, MUMmer provides information on single nucleotide changes, translocations, and homologous/similar genes.
Currently, we are looking into how MUMmer can be used to compare the homology of the genomes. If it turns out that the homology is not significant enough, then some other criteria must be used to narrow down the list of unique genes.
July 4-8, 2011
Author: Stephen
This week, Patrick went back to working on the Wiki, and I started to develop software for performing the big computation. My hope was to develop an original application based on several modules, which would process the entire genome. Module 1 would read and transcribe each of the genes in the entire genome. Module 2 would "compare the genes" and eliminate similar genes, and Module 3 would get the identity of each gene and sort them in some relevant order. However, my program for discarding junk DNA in Module 1 was not working as well as I had hoped; it identified over 150 genes in Mumps Virus, when there is only 7 according to BLAST. In the process of developing this software, I learned more about evolution and homology, horizontal gene transfer, the nitty-gritty details of transcription, and about codon tables.
July 11-15, 2011
Author: Stephen
This week, I realized that any organism with a sufficient number of genes that were horizontally transferred would, by definition, have genes that are non-homologous to other species within the same genus. Therefore, one way to find the naphthenic acid promoter is to look for genes which are horizontally transferred; horizontal transfer is a process by which bacteria share genetic information with dissimilar species.
In the process of looking up how to identify such a gene, I encountered a wonderful program called Dark Horse, which uses an algorithm for finding phylogenetically atypical proteins among bacterial strains on a genome-wide basis - that is, proteins which are likely to have been horizontally transferred. So I looked up all strains of Pseudomonas putida (F1, GB1, KT2440, and W619) and fluorescens (Pf5 and Pf01) on the database, and instantly obtained a list of 200 such genes. Unfortunately, we know not what the exact sequence of the strains used in the Del Rio paper were, but we are hopeful that there is enough similarity to the species in the database that they will share the horizontal genes. We are now cataloguing the bacterial species in this list, in order to create a comprehensive list of genes to test in the lab. What is really interesting about this is that, assuming that Del Rio's bacteria are similar to the ones in the database, we may already be sitting on the answer.
July 16-22, 2011
Author: Stephen
David, our TA, went through the catalogue of species and added additional notes to some species. Patrick and Stephen spent most of the week preparing for the aGEM Conference.
July 23-29, 2011
Author: Stephen
This week we finally finished cataloguing the interesting results from the Dark Horse results. We also discovered ProteinWorld HB, an online web tool which searches for unique genes between two species. However, we found no genes were listed in both lists. On Friday, we discovered that Pseudomonas.com also offers a comparative genome search, and a list of annotated genes in each genome. We look forward to comparing the list of annotated genes to the Dark Horse and Protein World results.
We looked also at GenomeBlast, which claims to find unique genes between species; however, we were not successful in obtaining the program's output. The authors of GenomeBlast suggested by email that the problem may have arisen due to the size of the genomes we were trying to process.
Also, we emailed Dr Mayi about how our results can be verified in the wet lab.
July 29-August 4, 2011
Author: Stephen
This week was short, because Monday was a civic holiday.
Since Dr Mayi said that we could experimentally verify some of our results, the main focus this week has been on selecting around 5 or 6 genes that are responsive to naphthenic acids. We want genes which turn on as a response to NAs, and genes that are inactive when there are no NAs present. We continue to think about that the identity of the strains of LD1 (Pseudomonas putida) and LD2 (Pseudomonas fluorescens) remains unknown.
We also discovered that pseudomonas.com has a pre-computed gene comparison search, which allows you to look for genes in one group of species/strains which are not present in another list. We have the results from this list as well, but we are using it as a reference for what genes are actually present.
On Wednesday and Thursday, we BLASTed each of the interesting genes we found on our Dark Horse and ProteinWorld results, to learn more about them. We eliminated or "nominated" genes based on their supposed relevance to fatty acid degradation pathways, and how relevant they could be to naphthenic acid degradation. We also noted when a particular genes appeared to have conserved domains, but looked substantially different than other genes with a similar function. Using these standards, we short-listed our gene list to about six genes. Our final list includes a handful of transcription factors (such as GI 58003974), Enoyl-CoA hydratase, and LysR family proteins. Next week, we intend to experimentally verify whether the gene products of these sequences are affected by naphthenic acid presence.
July 29-August 4, 2011
Author: Patrick
This week I looked into results from the Dark Horse and ProteinWorldDB results in an attempt to find proteins of particular interest. A handful of transcription factors appear to be of interest. P. putida GB-1 offers a hypothetical protein that most closely aligns with a transcription factor (GI 58003974). As well, lysR family proteins appear to be promising because of their transcription regulation properties. Enoyl-CoA hydratases seem to be part of fatty acid degradation pathways, so it may be a good protein to pursue as well.
August 5-12, 2011
Author: Stephen
Having selected 5 genes for verification, we went about preparing ourselves for the verification step. Our plan is to perform a quantative RT-PCR on two samples of bacteria, one treated by naphthenic acid and one that is not. We plan to find out whether the selected genes are up-regulated in response to naphthenic acids. On Monday, I read about RT-PCR and found out how it worked. On Tuesday, I slept the entire day, which helped me recover from oral surgery. On Wednesday,
August 12-18
Author: Patrick
This week I was primarily working on primer design for the RT-PCR, and learning about the process. I also gathered names for the formation of an iGEM club for the upcoming Fall and Winter semesters and submitted the forms to the Students' Union. I have also been writing out thank-you letters to our sponsors and donators and filling out the SEEL application form for funding in the future.
August 12-18
Author: Stephen
This week, I asked Patrick to help me design some primers. Much of the week was spent learning how to design primers, discussing what characteristics we wanted the primers to have, and designing the primers using Primer3 and PrimerBlast. The criteria we wanted our primers to have were:
- Melting Temperature: 70-74°C (optimally 72°C)
- Primer Length: 22-24 (optimally 24)
- Product Length: 150-250 base pairs (optimally 200)
- No runs of 4 or more of the same nucleotide letter in either the forward or reverse primer.
- The primers have a GC clamp at the end (ie. end with G, C, GC, or CG)
- The last 5 nucleotides contain 3 or 4 GC's.
- GC Content between 40-60% (optimally 50%)
- All unintentional matches calculated from Primer Blast are at least 25% mismatched.
I also learned that the melting point in the PCR machine is the maximum melting temperature for the primers. This was important because the constitutively expressed gene we found in a paper by Martínez-Lavanchy et. al. (2010), 16s rRNA, had a shorter primer length than the other primers we had designed. If we make it any longer, then the primer may no longer be genus-specific, which makes it less useful as a RT-PCR control. I had originally noticed this after I had already made all the primers, but I decided it was okay and the only downside was that the producer/primer length is not perfectly controlled.
August 19-25
Author: Stephen
This week, I got involved in planning out the RT-PCR experiment, learned how analyze the data, and mapped out the plate. I also designed BioBrickable Primers for the promoter regions of each of the genes in our short list from Dark Horse.
The first graph I learned how to analyze was the plot of the Optical Density vs. the Cycle Number. If a gene is transcribed in response to naphthenic acids, then we'd expect the initial mRNA concentrations to be higher before it is reverse transcribed back into DNA. Therefore, when we perform the PCR, the population will grow faster because it will be at an earlier stage. The difference in the number of cycles it takes between the control and treated samples to pass a specific threshold is indicative of the initial population. After normalizing the difference between the cycles which passed the threshold, we can statistically determine whether a particular gene is transcribed as a naphthenic acid response, or that it was just total luck. I learned how to analyze a second graph which shows two raised to the fold increase over the control.
The third graph I learned to analyze was the standard curve between the log of the template number and the cycle number. The better the linear regression coefficient R, the more ideal the PCR behaved.
In regards to the RT-PCR, it seems to work in approximately five steps: the culturing step, the RNA extraction step, the DNAse step, the reverse transcription step, and finally the qPCR step. Lastly, designing BioBrickable Primers is the same as designing normal primers, except for the insertion of the BioBrickable Suffix and Prefix.
August 26-September 2nd
Author: Stephen
The RT-PCR experiment was delayed because the cells did not grow properly. Robert told me that the cell media was at fault, so he made a new stock solution. I created a list of transcription factors shared in strains of fluorescens not found in putida. I also learned what a sigma factor is. I also learned about the genomic dissimilarity of different strains of Pseudomonas fluorescens from Silby 2006 et. al., which at the amino acid identity level can be lower than the similarity between different species within the same genus. Nevertheless, the strains of fluorescens are more similar to each other than any other species.
September 3-11
Stephen was on vacation, but Maggie performed an RT-PCR experiment using the genes from the short-list.