Team:Calgary/Project/Promoter/Bioinformatics

From 2011.igem.org

Revision as of 02:54, 29 September 2011 by Sj.dixon (Talk | contribs)


NA-Sensitive Promoters using Bioinformatics and qRT-PCR

Introduction

Numerous species of bacteria have been found to survive in the highly toxic environments of the tailing ponds. However, only recently environmental researchers have begun to identify the strains and species of these microorganisms. In a recent study directed by Moore et al. 2006, two species of Pseudomonas (putida and fluorescens) were shown to degrade naphthenic acids in co-culture to a much greater degree than culturing each species individually. The percentage of remaining naphthenic acids in a solution after four weeks of bacteria culturing is shown in Figure 1.

Figure 1 Naphthenic Acid Degradation Profile after 4 weeks of degradation by Pseudomonas strains LD1 and LD2. Source: Del Rio, Hadwin, Pinto, MacKinnon, & Moore, 2006

 This result would suggest that each strain of Pseudomonas provides a unique set of degradation machinery (likely in the form of enzymes) to complete the breakdown of naphthenic acids. Figure 2 shows how the degradation affected the concentrations of specific naphthenic acids of varying carbon number and saturation, as measured by GC-MS. The chart on the left of figure 2 shows the initial concentrations of each of the naphthenic acids, and the chart on the right shows the concentration of the naphthenic acids after four weeks of a LD1-LD2 co-culture. Based on the decreased height and scale, the figure demonstrates that naphthenic acids were more or less indiscriminately degraded by the co-culture. In terms of the development of a detection system for naphthenic acids, the indescriminate nature of this degradation would be advantageous in detecting a large group of NAs.

Figure 2 Naphthenic acid degradation profile after 4 weeks of degradation, shown with respect to saturation and carbon numbers. Source: Del Rio, Hadwin, Pinto, MacKinnon, & Moore, 2006

 When both figure 1 and 2 are taken together, the most likely conclusion is that the genes responsible for the degradation are spread across the two species. Furthermore, there exists a collection of essential genes in this degradation pathway which are unique to one species and not present in the other. The degradation pathway has not been experimentally confirmed, but it is believed to be some sort of metabolic pathway which involves beta-oxidation (Moore et al. 2006). In any case, the promoters upstream of these essential genes would be a likely place to find a naphthenic acid promoter - and indeed, our later experiments found that they were. A weakness of the bioinformatics search was that LD1 and LD2 have not yet been sequenced, and therefore we assume similarity between them and corresponding strains that have been sequenced; this weakness complicated primer design but was not prohibitive to the project.

Abstract

Three online algorithms (DarkHorse, ProteinWorld DB, and Pseudomonas.com) were used to find target gene candidates in silico, and a fourth (MUMmer ) was used to confirm experimental hypotheses. Using the data from these sources, we were able to construct a short list of gene candidates which were then processed through a quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) to verify gene expression as a response to naphthenic acids. The qRT-PCR found that a putative Enoyl CoA Hydratase (GeneID=YP_260124), from Pseudomonas fluorescens, was expressed as a response to naphthenic acids.

DarkHorse and ProteinWorld DB Results

DarkHorse is an online database for finding phylogenetically atypical proteins in various microbial species. Generally, phylogenetic atypicality can be useful for predicting the horizontal transfer of genes; for our purposes, the algorithm identified relatively unique genes for various strains of Pseudomonas putida and fluorescens. We conjectured that genes also found in tailing pond microorganisms, or involved in the degradation of naphthenic acids and similar acids, were the most suitable for further investigation. DarkHorse uses only genomic DNA and does not include any of the super plasmids (up to 100kb) known to be carried by different strains of Pseudomonas. The exclusion of these superplasmids limit the scope of our screening, but allowed us to focus on investigating the roles of the gene candidates and their source species in relevant degradation pathways.

Figure 3 Diagram showing how to search for horizontally transferred genes using the DarkHorse database. Source: http://darkhorse.ucsd.edu/search.shtml</a>.</p>

 Figure 3 shows a sample of the input used for our DarkHorse Query. The list returned by querying the database for phylogenetically atypical genes in LD1 and LD2 was 258 records long. Each record contained the GI number, strain, and gene annotation for the query species and its best phylogenetic match. To sort through the data, we started by collecting data about the best match species and the gene annotations. Using the data about the species, we found that some of them had known degradation pathways and were naturally occurred in tailings ponds. We were especially interested in such species when they donated a predicted transcription factor or degrative enzyme related to fatty acid degradation. Fatty acids are structurally similar to naphthenic acids and therefore may have similar promoters, or enzymes that would interact with this class of compounds. One putative short-listed gene, enoyl CoA hydratase, was selected due to its very low RPI score (0.001) - which suggests a strong likelihood that it was horizontally transferred - and also because of it was a member of a gene family involved in degrading compounds similar to naphthenic acids, such as caffeic and cinnanoic acid (Fiedler et. al. 2004 Journal of Applied Microbiology). A predicted Acyl-CoA-ligase with a LPI score of 0.401 also appeared to be involved in caffeic acid degradation, and was also short-listed. To view our DarkHorse bioinformatics results as a whole please click here.

 Additionally we used Protein World DB, a database for finding unique genes across a set of compared species. This program was useful for identifying potential metabolic genes unique to one species, which our hypothesis suggested is the case for the desired promoter. We used the only available Pseudomonas strains in their registry, putida KT2440 and fluorescens Pf-5, in our search.

Figure 4.Picture showing the layout of the ProteinWorld database website. Source: <a href="http://darkhorse.ucsd.edu/search.shtml">http://darkhorse.ucsd.edu/search.shtml/</a>

 ProteinWorld DB returned 55 unique hits for both species of bactera. Visual inspection and classification of these hits revealed a number of transcription factors known to be involved metabolic signalling in a variety of other organisms. In particular, two LysR based transcription factors, and a theoretical protein seemed to BLAST against other known transcription factors found in various species of bacteria. What was interesting, was the involvement of these transcription factor hits in metabolic pathways particularly in beta-oxidation. For this reason, we added these to a list for further examination. A thioesterase which might also be involved in naphthenic acid degradation was also short-listed onto our list. Please click here for a complete list of our Protein World DB hits.

 After examining both the DarkHorse and Protein World DB hits, our team was able to short-list six genes which we thought to be potentially involved in naphthenic acid detection and potentially degradation. These genes are listed below:



Figure 5List of genes short-listed for qRT-PCR verification, as queried from DarkHorse and ProteinWorldDB.

Quantative RT-PCR Results

In order to validate that our short-listed genes were involved in naphthenic acid sensing, we designed and implemented a qRT-PCR approach to analyzing changes in mRNA expression for these genes in the prescence of naphthenic acids. Cell cultures of LD1 and LD2 were grown in minimal media in mixtures containing either 80mg/L of naphthenic acids or Casamino acids. We used several kits from Qiagen, including RNA Protect and RNEasy, etc. 16s rRNA, a constitutively expressed conserved gene found in both LD1 and LD2, was used as a control for the experiment.

 Untreated cells and cells treated with NAs were briefly subjected to Qiagen RNA bacterial extraction protocol (see protocols section). The RNA was subjected to reverse transcription PCR using random primers and Reverse Transcriptase to generate cDNAs from RNA.

 The cDNAs generated are then used for quantitative real time PCR (qRT-PCR). During the PCR, we used primers which specifically binded to the transcribed region of the each gene, generating small DNA sequences later bound by SYBR Green, a DNA binding fluorophore. The binding of SYBR is directly proportional to the amount of DNA generated in the PCR reaction. A fluorescent scanner periodically measured the fluorescence of the reaction as the PCR proceeded. Initial concentrations are regressively and indirectly determined when the fluorescence passes a predefined intensity threshold> The threshold is known as a the concentration threshold (Ct), and the Ct value refers to the cycle at which this threshold is surpassed. Because the amount of DNA in a PCR doubles every cycle, the Ct value can be used to determine the amount of cDNA, and therefore mRNA, produced as a result of reverse transcription and gene transcription respectively. As a further control, differences in transcription of a tested gene are normalized to the transcription of a control gene expressed at the same rate in both treated and untreated samples. Control genes are generally transcripts that do not vary significantly with different treatments. If the Ct value of candidate genes differ from each other more than the control genes, then the evidence suggests that the differences in expression are a result of the treatment given to the cells.

Figure 6 RT-PCR melt curve of ECH primer set with LD2 cDNA

 Before conclusions can be drawn, it is important to determine if the primers accurately represent the changes in RNA levels of one gene exclusively. The quality of the primer can be determined using a melt curve. If the primers work properly, then a melt curve analysis will yield only one melting point. In an effort to characterize the primers, RT-PCR was used with some of the previously synthesized cDNA. The primer set ECH (Enoyl-CoA Hydratase) demonstrated that the Ct values proportionately changed with different cDNA levels of the strain LD2, as demonstrated in figure 6. Primers used for ECH were as follows:

  • Forward: 5'-GGCCTGTTTTGCCGACAT-3'
  • Reverse: 5'-GCAGGTAACGCCGAACGA-3'.
Figure 7 RT-PCR melt curve of ECH primer set with LD2 cDNA

Primers for the Polymerase Chain Reaction (PCR) were designed with the following guidelines in mind:

  • Melting Temperature: 70-74°C (optimally 72°C)
  • Primer Length: 22-24 (optimally 24)
  • Product Length: 150-250 base pairs (optimally 200)
  • No runs of 4 or more of the same nucleotide letter in either the forward or reverse primer.
  • The primers have a GC clamp at the end (ie. end with G, C, GC, or CG)
  • The last 5 nucleotides contain 3 or 4 GC's.
  • GC Content between 40-60% (optimally 50%)
  • All unintentional matches calculated using Primer Blast are at least 25% mismatched.

 The other five genes selected were attempted to be optimized but unfortunatley we are still in the process of characterizing the primers designed. The ECH primers showed a strong dissociation curve with a single peak suggesting that a single target was being synthesized. Further characterization of newly synthesized cDNA demonstrated the linear range at which the primers worked. cDNA was synthesized for all treatments (with and without NA treatment), and qRT-PCR was performed to test whether there was an increase in amplification between NA treated cells and cells not treated with NAs.

 To obtain the values from Figure 8, 16S-DNA Ct values were subtracted from ECH Ct values for the same treatment. The standard Relative Quantitation formula was then applied to the ΔCt values, in which the Ct value is an exponent of 2.

Figure 8 Relative transcript levels of Enoyl CoA Hydratase normalized to 16S RNA, showing an increase in expression between control cells and those treated with Naphthenic Acids.

 From this quantitation, it would appear that the putative Enoyl Coa Hydratase was upregulated specifically when cells were incubated with Naphthenic Acids. Further validation is required to ensure that this response is specific to a variety of conditions including general stress response, heat stress, and fatty acid degradation by regular beta-oxidation. These control experiments have been planned however there was insufficient time before the wiki freeze to complete this.

Appendix A: Pseudomonas.com Results

The genes found from a subsequent bioinformatic search on Pseudomonas.com were not used in the RT-PCR, but have been included as an appendix for the iGEM community. Pseudomonas.com is a database specializing in the annotation and collection of information related to the Pseudomonas genus, but it also has a comparative genome search tool which claims to allow the location of genes unique to one species and not the other. The list, which has not been completely processed yet, can be found in this .zip file in both .xls and .xlsx format (the latter is recommended). One of the most interesting genes found on the list is annotated as glyoxalase/bleomycin resistance protein/dioxygenase, but is similar to a ring-cleaving dioxygenase also found in P. putida.

Appendix B: MUMmer Findings

MUMmer is an open source program which can align genomes within a matter of seconds. We used MUMmer to verify our assumption that there was a substantial degree of homology between Pseudomonas putida and fluorescens. This assumption is important, because its corollary is that there is a limited number of genes unique to either species. The fewer unique genes there are, the easier it is to find the right one.

Figure 6

Figure 6 shows the alignment between Pseudomonas putida (x-axis) and fluorescens (y-axis). Linear regions are more conserved and scattered regions are less conserved.

 The graph above shows the genome alignment between P. putida and P. fluorescens, where the former is on the x-axis and the latter on the y-axis. Regions conserved between both species form a straight line going from the bottom left corner to the top right corner; the line going from the top left corner to the bottom right corner is also a conserved region between both species, except that it is backwards. Areas with a more scattered distribution indicate a more random, and therefore less homologous, aligned region between the two species.