Team:UPO-Sevilla/Foundational Advances/MiniTn7/Bioinformatics/attTn7 Insertion Site
From 2011.igem.org
Is attTn7 insertion site higly conserved?
Goal
To study the level of conservation of the Tn7 transposon recognition site (attTn7) over the phylogeny.
Background
Tn7 is quite a specific bacterial transposon which encodes five transposition proteins (TnS, A, B, C, D and E) (Peters et al, 2001). TnsD is one of the targets selector which inserts Tn7. TnsD recognizes a specific chromosomal site, known as an attachment site (attTn7), and modifies the DNA structure allowing the binding of TnsC, TnsA and TnsB (Waddel and Craig, 1988). These two last proteins insert the transposon. Therefore, insertion takes place preserving expression downstream the essential glmS gene (Gay et al, 1986). It has been reported that the TnsD binding site is found in E.coli in the last 36bp of the glmS ORF (Kuduvalli et al, 2001). This site encodes the active site region of GlmS whose amino acid sequence is nearly completely 100% conserved in all organisms (Milewski, 2002). However, when considering nucleotide sequences, this attTn7 site has not been totally characterized in a wide range of organisms. In 2010, Rupak et al. reported a schematic representation of the nucleotide attTn7 recognition site obtained from the alignment of 25 bacterial glmS genes which have the Tn7 family transposon and the homologous Drosophila, zebrafish and human genes (Rupak et al, 2010).
Figure 1. (from Prasad et al, 2001). Working model for the TnsABC+D pathway. attTn7 is indicated by the rectangle, the relative positions of TnsD, TnsC and the transposase TnsAB are marked by ovals, and the donor DNA is also shown at the insertion site, which is denoted 0. The top portion of the rectangle indicates the major groove side of attTn7 and the bottom side represents the minor groove face of attTn7. Also illustrated in the box is the 5 bp duplication (underlined) that occurs on Tn7 insertion and the arrows show the positions of joining of the transposon ends, executed by the transposase (TnsAB) on attTn7.
Figure 2. The organization of attTn7 (from Rupak et al, 2010). A schematic representation of the attTn7 at the C-terminus of the glmS gene with the TnsD binding site is shown. The sequence of the E. coli GlmS protein, and the consensus attTn7 sequence were derived as described in the text (note that the least conserved nucleotides correspond to the third position of each codon).
Strategy
To improve and extend the study of attTn7 over phylogeny. Also, we will analyze its level of conservation with the final idea of making our miniTn7 BioBrick toolkit extensible to other organisms. Moreover, in the development of a complete synthetic Tn7 transposon toolkit, obtaining a consensus nucleotide attTn7 sequence might be also interesting (it would be included in our system). To achieve this, several database searches, BLASTs, multi sequence alignments, phylogenetic trees and sequence logos were performed.
Procedure
Obtaining a consensus nucleotide sequence from organisms which have the Tn7 family transposon.
Searching for organisms with Tn7 family transposon. Several BLASTs were run in the Uniprot database BLAST server using as input E.coli Tn7 transposon proteins (for example TnsB). From the results, several organisms were chosen and it was checked on bibliography if Tn7 transposon had been found in the chosen organisms.
Glms protein sequences were downloaded from Uniprot database and the corresponding codifying sequences were downloaded from (from GeneBank database).
Multialignment of all the protein and codifying sequences (see figure 3). In the case of the Glms protein sequences this multialignment was performed to verify the high level of conservation reported by Milewski (Milewski, 2002). This sequences alignment was performed by using the software ClustalX2.
SeaVieW 4.3.0 sofware was used to obtain both a circular and a squared phylogenetic tree (see figure 4). Phylogenetic trees are an interesting tool which enables a better visualization of the genetic distances among the genetic sequences under study.
The last 38 nucleotide bases of the glmS gene (the conserved part which corresponds to attTn7 recognition site) were introduced in the WebLogo3 server to create a logo (figure 3).
The obtained consensus logo sequence was translated by using the ExPaSy Translate Tool to create a logo.
Same process but with species from all kingdoms, model organisms preferentially. It should be mentioned that in most of these organisms, the Tn7 family transposon has not been found.
Searching for organisms with Glms protein. BLASTs were run by introducing the E.coli Glms protein. As most of the results were fungus and bacteria, another BLAST was run with the Glms human protein as input. In this second case, most of the results were eukaryotic organisms. Organisms representing the main phylogenetic groups and model organisms were chosen.
Glms protein sequences were downloaded from Uniprot database and the corresponding codifying sequences were downloaded from GeneBank.
Multialignment of all the protein and codifying sequences. Sequences alignments were performed by using the software ClustalX2
SeaVieW 4.3.0 sofware was used to obtain both a circular and a squared phylogenetic tree.
The last 50 nucleotide bases of the glmS gene were introduced to create a logo in the WebLogo3 server.
The obtained consensus logo sequence was translated by using the ExPaSy Translate Tool to create a logo.
Comparison of the obtained sequences in both cases.