Team:Calgary/Project/Promoter/Bioinformatics

From 2011.igem.org

Revision as of 01:42, 29 October 2011 by Yqi2011 (Talk | contribs)


NA-Sensitive Promoters using Bioinformatics and qRT-PCR

Introduction

Numerous species of bacteria have been found to survive in the highly toxic environments of the tailing ponds. However, only recently environmental researchers have begun to identify the strains and species of these microorganisms. In a recent study directed by Moore et al. 2006, two species of Pseudomonas (putida and fluorescens) were shown to degrade naphthenic acids in co-culture to a much greater degree than culturing each species individually. The percentage of remaining naphthenic acids in a solution after four weeks of bacteria culturing is shown in Figure 1.

Figure 1 Naphthenic Acid Degradation Profile after 4 weeks of degradation by Pseudomonas strains LD1 and LD2. Source: Del Rio, Hadwin, Pinto, MacKinnon, & Moore, 2006

This result would suggest that each strain of Pseudomonas provides a unique set of degradation machinery (likely in the form of enzymes) to complete the breakdown of naphthenic acids. Figure 2 shows how the degradation affected the concentrations of specific naphthenic acids of varying carbon number and saturation, as measured by GC-MS. The chart on the left of figure 2 shows the initial concentrations of each of the naphthenic acids, and the chart on the right shows the concentration of the naphthenic acids after four weeks of a LD1-LD2 co-culture. Based on the decreased height and scale, the figure demonstrates that naphthenic acids were more or less indiscriminately degraded by the co-culture. In terms of the development of a detection system for naphthenic acids, the indescriminate nature of this degradation would be advantageous in detecting a large group of NAs.

Figure 2 Naphthenic acid degradation profile after 4 weeks of degradation, shown with respect to saturation and carbon numbers. Source: Del Rio, Hadwin, Pinto, MacKinnon, & Moore, 2006

When both figure 1 and 2 are taken together, the most likely conclusion is that the genes responsible for the degradation are spread across the two species. Furthermore, there exists a collection of essential genes in this degradation pathway which are unique to one species and not present in the other. The degradation pathway has not been experimentally confirmed, but it is believed to be some sort of metabolic pathway which involves beta-oxidation (Moore et al. 2006). In any case, the promoters upstream of these essential genes would be a likely place to find a naphthenic acid promoter - and indeed, our later experiments found that they were. A weakness of the bioinformatics search was that LD1 and LD2 have not yet been sequenced, and therefore we assume similarity between them and corresponding strains that have been sequenced; this weakness complicated primer design but was not prohibitive to the project.

Abstract

Three online databases (DarkHorse, ProteinWorld DB, and Pseudomonas.com) were used to find target gene candidates in silico, and a fourth (MUMmer) was used to confirm experimental hypotheses. Using the data from these sources, we were able to construct a short list of gene candidates which were then processed through a quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) to verify gene expression as a response to naphthenic acids. The qRT-PCR found that a putative Enoyl CoA Hydratase (GeneID=YP_260124), from Pseudomonas fluorescens, was expressed as a response to naphthenic acids.

DarkHorse and ProteinWorld DB Results

DarkHorse is an online database for finding phylogenetically atypical proteins in various microbial species. Generally, phylogenetic atypicality can be useful for predicting the horizontal transfer of genes; for our purposes, the algorithm identified relatively unique genes for various strains of Pseudomonas putida and fluorescens. We conjectured that genes also found in tailing pond microorganisms, or involved in the degradation of naphthenic acids and similar acids, were the most suitable for further investigation. DarkHorse uses only genomic DNA and does not include any of the super plasmids (up to 100kb) known to be carried by different strains of Pseudomonas. The exclusion of these superplasmids limit the scope of our screening, but allowed us to focus on investigating the roles of the gene candidates and their source species in relevant degradation pathways.

Figure 3 Diagram showing how to search for horizontally transferred genes using the DarkHorse database. Source: http://darkhorse.ucsd.edu/search.shtml</a>.</p>

Figure 3 shows a sample of the input used for our DarkHorse Query. The list returned by querying the database for phylogenetically atypical genes in LD1 and LD2 was 258 records long. Each record contained the GI number, strain, and gene annotation for the query species and its best phylogenetic match. To sort through the data, we started by collecting data about the best match species and the gene annotations. Using the data about the species, we found that some of them had known degradation pathways and were naturally occurred in tailings ponds. We were especially interested in such species when they donated a predicted transcription factor or degrative enzyme related to fatty acid degradation. Fatty acids are structurally similar to naphthenic acids and therefore may have similar promoters, or enzymes that would interact with this class of compounds. One putative short-listed gene, enoyl CoA hydratase, was selected due to its very low RPI score (0.001) - which suggests a strong likelihood that it was horizontally transferred - and also because of it was a member of a gene family involved in degrading compounds similar to naphthenic acids, such as caffeic and cinnanoic acid (Fiedler et. al. 2004 Journal of Applied Microbiology). A predicted Acyl-CoA-ligase with a LPI score of 0.401 also appeared to be involved in caffeic acid degradation, and was also short-listed. To view our DarkHorse bioinformatics results as a whole please click here.

Additionally we used Protein World DB, a database for finding unique genes across a set of compared species. This program was useful for identifying potential metabolic genes unique to one species, which our hypothesis suggested is the case for the desired promoter. We used the only available Pseudomonas strains in their registry, putida KT2440 and fluorescens Pf-5, in our search.

Figure 4.Picture showing the layout of the ProteinWorld database website. Source: <a href="http://darkhorse.ucsd.edu/search.shtml">http://darkhorse.ucsd.edu/search.shtml/</a>

ProteinWorld DB returned 55 unique hits for both species of bactera. Visual inspection and classification of these hits revealed a number of transcription factors known to be involved metabolic signalling in a variety of other organisms. In particular, two LysR based transcription factors, and a theoretical protein seemed to BLAST against other known transcription factors found in various species of bacteria. What was interesting, was the involvement of these transcription factor hits in metabolic pathways particularly in beta-oxidation. For this reason, we added these to a list for further examination. A thioesterase which might also be involved in naphthenic acid degradation was also short-listed onto our list. Please click here for a complete list of our Protein World DB hits.

After examining both the DarkHorse and Protein World DB hits, our team was able to short-list six genes which we thought to be potentially involved in naphthenic acid detection and potentially degradation. These genes are listed below:

Figure 5. List of genes short-listed for qRT-PCR verification, as queried from DarkHorse and ProteinWorldDB.

Quantitative RT-PCR Results

In order to validate that our short-listed genes were involved in naphthenic acid sensing, we designed and implemented a qRT-PCR approach to analyzing changes in mRNA expression for these genes in the prescence of naphthenic acids. Cell cultures of LD1 and LD2 were grown in minimal media in mixtures containing either 80mg/L of naphthenic acids or Casamino acids. We used several kits from Qiagen, including RNA Protect and RNEasy, etc. 16s rRNA, a constitutively expressed conserved gene found in both LD1 and LD2, was used as a control for the experiment.

Untreated cells and cells treated with NAs were briefly subjected to Qiagen RNA bacterial extraction protocol (see protocols section). The RNA was subjected to reverse transcription PCR using random primers and Reverse Transcriptase to generate cDNAs from RNA.

The cDNAs generated are then used for quantitative real time PCR (qRT-PCR). During the PCR, we used primers which specifically binded to the transcribed region of the each gene, generating small DNA sequences later bound by SYBR Green, a DNA binding fluorophore. The binding of SYBR is directly proportional to the amount of DNA generated in the PCR reaction. A fluorescent scanner periodically measured the fluorescence of the reaction as the PCR proceeded. Initial concentrations are regressively and indirectly determined when the fluorescence passes a predefined intensity threshold> The threshold is known as a the concentration threshold (Ct), and the Ct value refers to the cycle at which this threshold is surpassed. Because the amount of DNA in a PCR doubles every cycle, the Ct value can be used to determine the amount of cDNA, and therefore mRNA, produced as a result of reverse transcription and gene transcription respectively. As a further control, differences in transcription of a tested gene are normalized to the transcription of a control gene expressed at the same rate in both treated and untreated samples. Control genes are generally transcripts that do not vary significantly with different treatments. If the Ct value of candidate genes differ from each other more than the control genes, then the evidence suggests that the differences in expression are a result of the treatment given to the cells.

Figure 6 RT-PCR melt curve of ECH primer set with LD2 cDNA

Before conclusions can be drawn, it is important to determine if the primers accurately represent the changes in RNA levels of one gene exclusively. The quality of the primer can be determined using a melt curve. If the primers work properly, then a melt curve analysis will yield only one melting point. In an effort to characterize the primers, RT-PCR was used with some of the previously synthesized cDNA. The primer set ECH (Enoyl-CoA Hydratase) demonstrated that the Ct values proportionately changed with different cDNA levels of the strain LD2, as demonstrated in figure 6. The ECH primers showed a strong dissociation curve with a single peak suggesting that a single target was being synthesized. Primers used for ECH were as follows:

  • Forward: 5'-GGCCTGTTTTGCCGACAT-3'
  • Reverse: 5'-GCAGGTAACGCCGAACGA-3'.
Figure 7 RT-PCR melt curve of ECH primer set with LD2 cDNA

Primers for the Polymerase Chain Reaction (PCR) were designed with the following guidelines in mind:

  • Melting Temperature: 70-74°C (optimally 72°C)
  • Primer Length: 22-24 (optimally 24)
  • Product Length: 150-250 base pairs (optimally 200)
  • No runs of 4 or more of the same nucleotide letter in either the forward or reverse primer.
  • The primers have a GC clamp at the end (ie. end with G, C, GC, or CG)
  • The last 5 nucleotides contain 3 or 4 GC's.
  • GC Content between 40-60% (optimally 50%)
  • All unintentional matches calculated using Primer Blast are at least 25% mismatched.

We attempted to optimize the other five genes from the short list, but unfortunately we are still in the process of characterizing their primers. Further characterization of newly synthesized ECH cDNA demonstrated the linear relationship between the amount of cDNA and the amount of mRNA. cDNA was synthesized for all treatments (with and without NA treatment), and qRT-PCR was performed to test whether there was an increase in amplification between NA treated cells and cells not treated with NAs.

Figure 8 Relative transcript levels of Enoyl CoA Hydratase normalized to 16S RNA, showing an increase in expression between control cells and those treated with Naphthenic Acids.

 To obtain the values for Figure 8, 16S-DNA Ct values were subtracted from ECH Ct values for the same treatment. The standard Relative Quantitation formula was then applied to the ΔCt values, in which the Ct value is an exponent of 2.

Based on this information, it appears that the putative Enoyl Coa Hydratase was upregulated specifically when cells were incubated with Naphthenic Acids. Further validation is required to ensure that this response is specific to a variety of conditions including general stress response, heat stress, and fatty acid degradation by regular beta-oxidation. These control experiments have been planned however there was insufficient time before the Americas regional wiki freeze to complete this.

Further Characterization of ECH/fad-2 Gene Expression

In the course of pursuing a naphthenic acid inducible promoter, we decided to continue investigating the putative enoyl-CoA hydratase orthologue, fad-2. After performing the first experiment, fad-2 appeared to be enriched 6-fold compared to untreated control when naphthenic acids were exposed to the Pseudomonas fluorescens strain LD2.

However, upon further examination of the RT-PCR experiment, we found an error in previous calculations, showing that fad-2 transcript levels decreased upon NA exposure instead of increasing as we originally thought. The method currently adopted is based on Pfaffl et al., (2001), using the following website for further guidance (http://pathmicro.med.sc.edu/pcr/realtime-home.htm). Whether transcription is activated or inactivated, we believed that the change in activity itself was worth investigating in the hopes of finding a regulatory element.

Figure 9

A calculation error revealed that the enoyl-CoA hydratase orthologue fad-2 is downregulated not upregulated in response to ~80mg/L naphthenic acid treatment. (correction for Figure 8).

We examined the chromosomal region surrounding fad-2 and observed that it was likely part of an operon, with another putative operon on the opposite strand. We reasoned that if NA consistently represses fad-2 expression, it may be transitioning expression to genes on the opposite strand. We adopted the strategy to determine if repression of fad-2 in the presence of naphthenic acids was a consistent result, and if it was, did that mean that the opposite strand operon is being activated therebye causing the repression of the fad-2 operon? Due to the limitations of time, we have not been able to characterize amplification of genes on the opposite strand and went straight to synthesis of the putative promoter to position it before a response element and test if it was inducible.

Figure 10

Chromosomal map of fad-2 and neighboring genes and intergenic regions. Source: http://www.pseudomonas.com



Is fad-2 consistently repressed by naphthenic acids?

Repeats of these experiments demonstrated that similar to the initial experiment, NA treated samples have a lower enrichment of fad-2 transcript than the control sample, however there was only a 2-fold expression difference instead of the original 6. The reason for this is possibly because more recent experiments were performed using different conditions from the original. In both experiments, cells were treated with 80mg/L of NA and other stimuli between 6 and 10 hours and were harvested while in exponential growth phase. The original experiment used cultures that were treated for a longer time course and may have been in stationary phase, which potentially resulted in a greater amount of fad-2 mRNA transcripts.

Figure 11

The regulation of Fad-2 by naphthenic acid treatment appears to be consistent, although much less than what was found in the first experiment. Decanoic acid, a straight chained fatty acid appears to increase the amount of Fad-2 transcript. Expression levels are relative to 16s rDNA levels, and normalized to untreated control.

To determine if the regulation we were seeing is specific to naphthenic acids, we performed the experiment with other chemical treatments: decanoic acid, a straight chained fatty acid; cinnamic acid, a naphthenic acid with an aromatic ring followed by an unsaturated fatty acid tail, which is a known substrate for enoyl CoA hydratase, and is thus used as a positive control treatment; and hydrogen peroxide, an oxidizing agent that produces a general cellular stress response. Chemical structures of the chemicals used are shown in the figure below.

Figure 12

Pseudomonas LD2 cultures were exposed to cinnamic acid, decanoic acid, naphthenic acid (cyclohexane pentanoic acid) or hydrogen peroxide (not shown) prior to total RNA isolation

In this experiment, the decrease in Fad-2 due to naphthenic acids was not substantial. Cinnamic acid has been previously described as being modified by enoyl-CoA hydratase. In cells treated with cinnamic acid, a 1.6 fold increase in fad-2 transcript was observed. Decanoic acid exposure did not appear to increase fad-2 transcript in this experiment. The lack of response to decanoic acid indicates that fad-2 expression probably decreases specifically to NA exposure, and fatty acids, which is a similar class of hydrophobic compounds with a carboxylic acid group, cannot elicit the same change. Hydrogen peroxide did not alter Fad-2 transcript levels either, suggesting that fad-2 or ECH is likely not a general stress response gene. There is a continual loss of amplification as we have performed these experiments. Coincident with this is the appearance of a small second peak in the melt curve that did not appear during primer optimization studies (see figure here). These experiments need to be repeated when new primers are ordered.

Figure 13

Relative Quantitation of Fad-2 Transcript upon exposure to naphthenic acids (dissolved in water or ethanol), decanoic acid, cinnamic acid, and H2O2 after normalization to control.

Future experiments in characterization of NA response:

Future characterization experiments would explore enrichment of opposite strand transcripts of the fad-2 operon. Although they are all putative proteins, BLINK analysis (ref) demonstrated that the first 2 predicted ORFs encode a lipoprotein and a choloyl glycine hydratase. These proteins may be altering the composition of the plasma membrane in response to the naphthenic acid surfactant properties.

We would also continue to modify the culture conditions for NA treated cells. In these experiments, we did not optimize the use of minimal media, which is more suited to experiments for RNA extraction, compared to LB.

Lastly, there were other candidate genes to investigate by bioinformatics screen that we were unsuccessful in developing PCR primers for. We are interested in sequencing the environmental isolate strains LD1 and LD2 to refine our screen and perhaps be able to design better primers for this line of investigation.

Appendix A: Pseudomonas.com Results

The genes found from a subsequent bioinformatic search on Pseudomonas.com were not used in the RT-PCR, but have been included as an appendix for the iGEM community. Pseudomonas.com is a database specializing in the annotation and collection of information related to the Pseudomonas genus, but it also has a comparative genome search tool which claims to allow the location of genes unique to one species and not the other. The list, which has not been completely processed yet, can be found in this .zip file in both .xls and .xlsx format (the latter is recommended). One of the most interesting genes found on the list is annotated as glyoxalase/bleomycin resistance protein/dioxygenase, but is similar to a ring-cleaving dioxygenase also found in P. putida.

Appendix B: MUMmer Findings

MUMmer is an open source program which can align genomes within a matter of seconds. We used MUMmer to verify our assumption that there was a substantial degree of homology between Pseudomonas putida and fluorescens. This assumption is important, because its corollary is that there is a limited number of genes unique to either species. The fewer unique genes there are, the easier it is to find the right one.

Figure 6

Figure 6 shows the alignment between Pseudomonas putida (x-axis) and fluorescens (y-axis). Linear regions are more conserved and scattered regions are less conserved.

The graph above shows the genome alignment between P. putida and P. fluorescens, where the former is on the x-axis and the latter on the y-axis. Regions conserved between both species form a straight line going from the bottom left corner to the top right corner; the line going from the top left corner to the bottom right corner is also a conserved region between both species, except that it is backwards. Areas with a more scattered distribution indicate a more random, and therefore less homologous, aligned region between the two species.