Team:Harvard/Template:NotebookData

From 2011.igem.org

(Difference between revisions)

Revision as of 14:52, 2 August 2011

June 28th

Sequencing:

the following samples from 6/27 were sent to Genewiz for sequencing:
- PyrF F, R (one sample with PyrF_F, one with PyrF_R)
- rpoz F, R (one sample with rpoz_F, one with rpoz_R)

Lambda red results:

the colonies on the plates did not look promising, and the ones we chose and grew up in LB+kan did not actually grow. Just to be certain, we choose 18 more colonies: 6 from 37.5µL arabinose 100µL plated, 6 from 3.75µL arabinose 100µL plated, and 6 from 37.5µL arabinose 1.5mL plated. Three from each plate were grown in plain LB and three with kan. We will let it grow in 30˚C, overnight if necessary, and hopefully see bacteria for PCR.
Assuming this does not work, we prepared more ∆HisB∆PyrF∆rpoZ+pKD46 in two ways: we put 3 colonies in LB+amp from the 6/16 transformation plate, and we streaked a new amp plate from the glycerol stock
Another possibility is that something is wrong with our lambda red. We designed primers to verify that the pKD46 plasmid is really in the cells.

Kan-ZFB-wp-his3-ura3 construct:

Our last few PCR purifications have given us very low yields, and consequently we have had to use large amounts of our DNA (and the large amounts of buffer salts may also be why our lambda red recombinations have failed). When we tried to amplify our current DNA using the hisura-kan_F and ZFB-wp-hisura_R primers and the Phusion mastermix, it did not work (see 6/23). We will try to gain more product in two ways:

1) Repeat 6/23 PCR but use KAPA mastermix

the KAPA mix may work better than the Phusion.
Used KAPA protocol with 1µL of kan-ZFB overlap as template, hisura-kan_F and ZFB-wp-hisura_R primers, 65˚C annealing temp, 90 sec elongation time
made 2 reactions

2) Repeat overlap extension PCR (see 6/16) with KAPA mastermix

used KAPA protocol with 1µL kan cassette and 1µL of ZFB-wp-hisura, 65˚C annealing temp, 90 sec elongation
10 cycles without primers; hisura-kan_F and ZFB-wp-hisura_R primers added; 15 more cycles
made 6 reactions

E gel to check reactions worked: all 6 overlap PCRs successful, but not the other two reactions.

File:2011.06.28kanZFBconstruct(labeled).png

kan-ZFB-wp-hisura construct 6/28/11

combined samples 1-3 and 4-6 and ran on 1% agarose gel for extraction

File:2011.06.28kanZFBconstruct for extract1(labeled).png

kan-ZFB-wp-hisura construct for gel extraction 6/28/11

used Qiagen gel extraction kit and instructions with the following modifications:
- gel bands were dissolved in 500µL of buffer QG regardless of the gel volume
- gel heated at 50C for 20 min (to make up for reduced amount of buffer QG)
- after melting, 10µL of NaOAC (3M) were added to adjust the pH
- DNA from samples 1-3 were eluted in 20µL of ddH2O; DNA from samples 4-6 were eluted in 20µL of buffer EB
- water sample: 273.4 ng/µL, 260/280=1.92
- EB sample: 136.9 ng/µL, 260/280=2.38

June 28th - Bioinformatics

Attention all Harvard iGEM-ers!!! According to the iGEM Main Page, our preliminary project descriptions and safety proposals are due on July 15. Please see the aforementioned link so we can get this done ASAP- we don't want to miss any deadlines and have all our hard work wasted!

Finalized our Positive Control Sequence Table, using Justin's macro to insert the F1 helices into the appropriate zif268 F2 backbone

Length of chip oligos: 131-140bp (based on Cut Site Design)
- Primers: 20bp (x2= 40bp)
- zif268 F2 backbone + helix= 23aa (x3=69bp; some fingers ~3aa longer)
- Some alternate backbones are longer than zif268 F2 backbone
- Type II binding/cut sites= 11bp on each side (22bp total)
- Standard legnth: 40 + 69 + 22 = 131bp

Use WebLogos as a final visual check of our final generated sequences

Plasmid and Oligo Design Schematics

File:Oligo design on board.jpg

Oligo Design

File:Plasmid design on board.jpg

Expression Plasmid Design

Chip-Based Sequence Design Schematic

File:Chip protocol.png

Chip-based process for sequence design, taken from Kosuri, et al. 2010 model of scalable gene synthesis Kosuri2010

References

Kosuri2010 pmid=21113165

</biblio>

Harvard Logo

File:Harvard logo.png

Running the Generator!

File:Fasta total.csv NOTE: LATER GENERATED NEW SEQUENCES. NOT UP TO DATE.

Generated Final Chip Sequences

We ran the generator once earlier this afternoon, but had to re-run it again due to a typo in the cut sites and the number of sequences we desired for each backbone. Luckily, we caught these errors, and after checking the program once again, we ran it a final time this afternoon.
- It took about 45 minutes for the program to generate and reverse translate the 54900 sequences.
- During this time, we created a function that will re-translate the sequences that the generator output. It compares the original helix with the re-translated helix to make sure that our reverse-translate works properly.
  - This step went smoothly, and we verified that the sequences were reverse-translated properly.
- To make sure that the distributions generated were as expected, we made WebLogos of the helices generated(see below).
The output file (in the Dropbox: iGem > chip > final chip.csv) originally had the following headers: 'Target', 'Backbone #', 'Helix Sequence', 'Backbone Sequence', 'Nucleotide Sequence of Zinc Finger'
- We wanted to convert this information into FASTA format.
  - We wrote a function that converted our original file into fasta format (in the Dropbox: iGem > chip > fasta.csv)
  - The file FASTA_total (also linked above) contains the FASTA for all 50000 sequences (including the 100 controls).
  - For those curious, the FASTA format just a format that looks like this:

>Header (For us the header is: Target, Backbone #, Helix Sequence, Backbone Sequence)(The header for the controls are: Index Number, 'control')
 Sequence (In our case, the nucleotide sequence of the zinc fingers)

Generated WebLogos for Final Chip

File:AAA.png AAA	File:ACC.png ACC	File:CTC.png CTC
File:CTG.png CTG	File:GAC.png GAC	File:TGG.png TGG

FASTA-Formatted Chip Data:

>NNN(Target Triplet) BB# Helix Seq.

Nucleotide seq. of ZF

Bioinformatics Candids

File:Justin speaking.jpg

File:Justin writing.jpg

File:Zif268 sequence by memory.jpg

zif268 sequence by memory. You know you've stared at too many zif268 sequences when...

File:Primer Index iGEM 2011

Design of Plate Practice Sequences

While we wait for the chip to come in, we have a number of techniques and protocols that we can practice on beforehand, so that when the chip comes we'll be ready to go to use what they give us. We will be practicing the following techniques:

Cutting ZF1 out of our oligos
Inserting ZF1 into the expression plasmid in between the omega subunit and the linker before F2
Verifying that combination of our F1 from the oligo with the plasmid produces a viable, functional ZF
Amplifying subpools of oligos for testing
Inserting the expression plasmids into the E. coli containing our selection genome
Verifying that our ZF-binding site/GFP expression paradigm works

To this end, we will be ordering a 96-well plate from IDT containing oligos that will simulate the entire tube of oligos that we will receive from Agilent in four weeks. These oligos will consist of the following:

6 positive controls (we know which DNA sequences these bind to)
- 3 of them being the F1 fingers of Zif268, OZ052, and OZ123
- 3 of them being ZF F1s derived from CODA.
90 generated sequences, picked from a subset of the chip
- These are picked evenly across the 9,150 sequences generated on the cihp for the TGG triplet F1 target from the colorblindness "bottom finger" target, GTG GGA TGG. This particular target was chosen because the F2/F1 is a GNNTNN combo, which might be more likely to get hits from our chip generation sequences.

The primer tag sequences for the 90 generated sequence subset will be the same as they are on the chip (for the sake of explanation, we will refer to them now as P1F and P1R in this paragraph). The positive controls will be flanked immediately by the same primers as the generated subset so that we can amplify everything as one pool altogether should we need to (so this will be P1F and P1R). However, we will also put an additional set of primers outside of the P1F/P1R primers for the positive controls so that we can specifically amplify the positive control subpool, should we want to. These primers will be the same as the primers for the positive control on the chip (which will be called P2F and P2R here).

To recap, on the chip we will have the following oligos :

45750 other oligos for the 5 other target sequences
Oligos (TGG set, 9150 total):   | P1F | type II binding site | generated F1 | type II binding site | P1R |
Oligos (+ control, 100 total):  | P2F | type II binding site |  control F1  | type II binding site | P2R |

In our test pool of 96 sequences, we will have two types of oligos (note the two pairs of primers around the positive controls):

Oligo (TGG set, 90 total):          | P1F | type II binding site | generated F1 | type II binding site | P1R |
Oligo (+ control, 6 total):   | P2F | P1F | type II binding site | generated F1 | type II binding site | P1R | P2R |

Once we get our test sequences back from IDT, they will come in a 96-well plate with one oligo in each plate. We should make a mixture using some of each well in order to create a tube that contains all 96 sequences. This will simulate the tube that we will receive from Agilent, except instead of 55,000 sequences we will have 96 sequences only in this tube. From here, we can practice using this as a library.

We can pretend that this tube is just 96 generated sequences on the chip, treating the positive controls as if they were also generated sequences (we only include them in the 96 to ensure that we will indeed get a "hit" from this practice screening). Thus, we can just use the P1F/P1R primer set to amplify all of them in order to use them for the subsequent steps.

These subsequent steps will be those that were outlined above, namely cutting out the F1 sequence from each oligo, ligating this F1 into our expression plasmid, putting the expression plasmid into our selection strain, observing colonies which get infused with ZFs that bind to our target site (the "hits"), and sequencing the colonies that get hits to determine which ZF they are expressing.

We will be repeating these exact same steps once we get the chip, so if we can perfect our protocols with these practice sequences, we should be golden when the chip comes in.

June 30th

Lambda Red, Backbone, and Sequencing PCR

Gel run on the presence of a Lambda Red protein in the pKD46 plasmid showed that it is indeed present, so our recombination failures have not been due to an incorrect plasmid.
Gel run on the backbone of pZE21G plasmid was success and took us one step closer to obtaining all parts necessary for the three part assembly
Gel run on the pyrF and rpoZ was success
- Therefore we sent the PCR products and primers to GENEWIZ for sequencing again

File:2011.06.30.lambda spec pyrFrpoZ(labeled).png

pKD46, pZE21G, and PyrF and rpoZ loci 6/30/11

pZE21G backbone:

Since last night's PCR was successful, we will redo it with a few protocol adaptations to get a cleaner product and to increase our yield when we purify
KAPA mastermix and protocol: primers HindIII-F and KpnI-R
- template: pZE21G miniprepped plasmid, 1µL
- 2 min elongation time, 30 cycles
- 2 samples at 55˚C annealing, 2 samples at 60˚C

Lambda red and MAGE:

Yesterday's prep produced tiny colonies on the MAGE plates and (so far) none on the kan-ZFB plates. Just in case it didn't work, we will redo the lambda red using even more DNA and perform a second round of MAGE using culture from yesterday that was not plated.
same procedure, but with the following changes:
- 5µL kan-ZFB (about 1 mg)
- recover 3 hrs
- kan-ZFB: plate 100µL and 2 mL on kan plates
- MAGE: plate 1µL and 10µL on amp plates
To see if the colonies on the MAGE plate knocked out HisB, we chose 24 colonies, resuspended them in water, and put half the cells in LB (complete media) and half in NM media (does not have histidine). 96 well plate, 150µL media, grown overnight at 30˚C.

ZF Expression Plasmid Ultramer and Primer Design

Today, we designed primers ZF_073 through ZF_085 as listed in the iGEM Primer Index spreadsheet. These were basically two sets of primers: the primers to clone out the omega subunit and linker, and the ultramers that would construct the last part of the linker along with the type II binding sites and F2/F3 fingers. One should refer to the primer list for the sequences.

Note: the annealing sequence for the ultramer overlap contained a 72 degree melting temp hairpin. To get around this, I changed one of the codons in the F2 backbone. The F2 backbone begins with "FQCRIC", and so I changed the codon for the arginine (R) from CGC to CGT, which resolved the hairpin problem.

June 30th - Bioinformatics

Updated Primer list and FASTA formatting

We ran into a small hiccup, when we were informed that we had forgotten to reverse translate the reverse primer sequences that were being appended to the generated sequence. This is because the primer sequences we were given were the sequences for the actual primers, rather than sequences to which the primers would bind. Luckily, we caught this error! We did have the re-run the generator because we had to make sure that our generated sequences did not contain the new primers.

Here is the updated primer list:

This is the set of final target sequences with assigned forward and reverse primers (tags for PCR):

Disease	Target Sequence	Forward Primer (5'-3' NOT REVERSE COMPLEMENT)	Reverse Primer (5'-3' REVERSE COMPLEMENT)
Colorblindness	GCT GGC TGG	ATATAGATGCCGTCCTAGCG	TGGGCACAGGAAAGATACTT
Colorblindness	GCG GTA ACC	CCCTTTAATCAGATGCGTCG	GGTCGCCCTTATTACTACCA
Familial Hypercholesterolemia	GGC TGA GAC	TTGGTCATGTGCTTTTCGTT	TCTGAGTATCCGATACCCCT
Familial Hypercholesterolemia	GGA GTC CTG	GGGTGGGTAAATGGTAATGC	GCTATATCCGGGGAATCGAT
Myc-gene Cancer	GGC TGA CTC	TCCGACGGGGAGTATATACT	TTGGCCTGAAGCAGTTAGTA
Myc-gene Cancer	GGC TGG AAA	CATGTTTAGGAACGCTACCG	GGGAGGGAACGGAGATTATT
Controls	n/a	GTACATGAAACGATGGACGG	CGCTGAGGAGACTATACCAG

There was also a small error in the FASTA formatting. There are not supposed to be any spaces in the header, so the spaces were replaced with underscores.

Example:

>1_control

GTACATGAAACGATGGACGGGGTCTCAGCCATTCCAATGTCGTATCTGTATGCGTAATTTTTCACGCAAACACCATTTGGGTCGTCATATCCGTACGCACACGGTGAGACCCGCTGAGGAGACTATACCAG

@@ Line 1: / Line 1: @@
 <div id="606" style="display:none">
 ==June 6th==
-First day if iGEM.
+First day if iGEM.</div>
-</div>
 <div id="607" style="display:none">
 == June 7th ==
@@ Line 12: / Line 10: @@
 Procedure: followed Qiagen Kit instructions, each student (8) using 1 mL cell suspension
-Results: DNA reasonably pure (260/280 between 1.8 and 2) and between 25 and 50 ng/µL
+Results: DNA reasonably pure (260/280 between 1.8 and 2) and between 25 and 50 ng/µL</div>
-</div>
 <div id="608" style="display:none">
 == June 8th ==
 '''PCR to connect ultramers into OZ052 (Zif268 F2 triplicate, GCCGATGTC)and OZ123 (Zif268 F2 triplicate, GAGTGGTTA):'''
@@ Line 45: / Line 40: @@
 *5) Repeat 2-4 for 25 cycles
 *6) 68⁰C for 5 min
-*7) 4⁰C forever
+*7) 4⁰C forever</div>
-</div>
 <div id="609" style="display:none">
 == June 9th ==
@@ Line 215: / Line 208: @@
 |}
-Follow up work on this will be to convert this table to frequencies instead of values: values are less meaningful.
+Follow up work on this will be to convert this table to frequencies instead of values: values are less meaningful. </div>
-</div>
 <div id="610" style="display:none">
 == June 10th ==
@@ Line 303: / Line 294: @@
 |}
-Follow up work here is to check more properties, and maybe try individual pairings (ex. phobic-philic, polar-phillic).
+Follow up work here is to check more properties, and maybe try individual pairings (ex. phobic-philic, polar-phillic).</div>
-</div>
 <div id="613" style="display:none">
 == June 13th ==
@@ Line 378: / Line 367: @@
 :: add pseudo counts (call add_pseudo) and generate a dependent random call for a position (using generate_indep on the adjusted matrix)
-We finished generate_indep, generate_dep, and add_pseudo today, along with creating a 140x140 matrix of needed values.
+We finished generate_indep, generate_dep, and add_pseudo today, along with creating a 140x140 matrix of needed values.</div>
-</div>
 <div id="614" style="display:none">
 == June 14 ==
@@ Line 448: / Line 435: @@
-Brandon learned basic Python today. Justin created a JavaScript program that recognizes potential binding sites from a given sequence.
+Brandon learned basic Python today. Justin created a JavaScript program that recognizes potential binding sites from a given sequence.</div>
-</div>
 <div id="615" style="display:none">
 ==June 15th==
@@ Line 623: / Line 608: @@
   |GAGGCGTGGC
   |[http://www.pdb.org/pdb/explore/explore.do?structureId=2WBU]
-  |}
+  |}</div>
-</div>
 <div id="616" style="display:none">
 ==June 16th==
@@ Line 771: / Line 754: @@
 *Many of our current programs currently look at overall data or data based on specific DNA triplets (for example: 'GAT' or 'AAA'). However, in order to more easily understand some of the patterns that occur in the datasets, we want to examine broader subsets of data. For example, do different patterns appear when looking at fingers that bind to 'GNN' triplets versus 'NGN' triplets (where 'N' represents any of the 4 bases)?
 **We added the capability for our programs to accept inputs with the variable 'N' by using regular expressions.
-***We can now create lists of the zinc fingers that bind to any triplet, and create interaction matrices and frequency tables for any triplet input.
+***We can now create lists of the zinc fingers that bind to any triplet, and create interaction matrices and frequency tables for any triplet input.</div>
-</div>
 <div id="617" style="display:none">
 ==June 17th==
@@ Line 832: / Line 813: @@
 #OpenWetWareCodonUsage http://openwetware.org/wiki/Escherichia_coli/Codon_usage
 #NIHRareCodonCalculator http://nihserver.mbi.ucla.edu/RACC/
-</biblio>
+</biblio></div>
-</div>
 <div id="620" style="display:none">
 ==June 20th==
@@ Line 880: / Line 858: @@
 | [[File:nnc_probs.png|thumb|left|Probability data for the 262 fingers that bind to '''NNC''' triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of an arginine at position 6.]]
 | [[File:nna_probs.png|thumb|left|Probability data for the 218 fingers that bind to '''NNA''' triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of a glutamine at position -1 and an arginine at position 6.]]
-|}
+|}</div>
-</div>
 <div id="621" style="display:none">
 ==June 21st==
 '''His3 sequencing results:'''
@@ Line 1,021: / Line 995: @@
 ****2 and 1
 ****6 and 5
-**Because there is not much data for 'CNN' and 'ANN' sequences (with 16 and 29 known fingers that bind to each triplet, respectively), we should use pseudocounts for these sequences, so that our frequency generator is not too biased toward probabilities that may not be significant.
+**Because there is not much data for 'CNN' and 'ANN' sequences (with 16 and 29 known fingers that bind to each triplet, respectively), we should use pseudocounts for these sequences, so that our frequency generator is not too biased toward probabilities that may not be significant.</div>
-</div>
 <div id="622" style="display:none">
 ==June 22nd==
@@ Line 1,228: / Line 1,200: @@
   | [[File:IGem_logo_ANN_based_on_open.png|thumb|left|WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates only OPEN data.]]
   | [[File:IGem_logo_ANN_based_on_open_and_persikov.png|thumb|left|WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates both OPEN and Persikov data.]]
-|}
+|}</div>
-</div>
 <div id="623" style="display:none">
 ==June 23rd==
@@ Line 1,319: / Line 1,287: @@
 | Myc-gene Cancer||chr8:128,938,529-128,941,440||981||GGA GAG GGT||style="background:#92D050" | GGC TGG AAA||QANHLSR.RQDNLGR.TRQKLET||EKSHLTR.RREHLTI.#######
 |}
-*Green cells are our target sequences.
+*Green cells are our target sequences.</div>
-</div>
 <div id="624" style="display:none">
 ==June 24==
@@ Line 1,418: / Line 1,384: @@
 *Reverse translate fingers avoiding type II restriction enzymes and primers
 *Append type II restriction enzyme and primer sequences to each finger
-*Yay
+*Yay</div>
-</div>
 <div id="625" style="display:none">
 ==June 25th-26th - Bioinformatics==
@@ Line 1,445: / Line 1,409: @@
-Additionally, primer tags '''(forward: GTACATGAAACGATGGACGG, reverse:CTGGTATAGTCTCCTCAGCG)''' will be assigned to the 100 control sequences.
+Additionally, primer tags '''(forward: GTACATGAAACGATGGACGG, reverse:CTGGTATAGTCTCCTCAGCG)''' will be assigned to the 100 control sequences.</div>
-</div>
 <div id="627" style="display:none">
 ==June 27, Wet lab==
@@ Line 1,517: / Line 1,479: @@
 The program appears to run extremely slowly because of the computationally intensive step of checking the reverse translated sequences
 *In addition to checking for the primers and cutsites, we also have to check for 'GGGGGG' because it can lead to undesirable structures forming. In addition, we have to check for the reverse complements for all these undesirable sequences.
-*We decided on a similarity of 0.8 as the maximum acceptable similarity between the sequence the primer bind to and any other part of the generated sequence. If the sequences are too similar, the primer might mishybridize. We originally had a similarity threshold of 0.6 but that made the program run too slowly, so we decided on a '''threshold of 0.8'''.
+*We decided on a similarity of 0.8 as the maximum acceptable similarity between the sequence the primer bind to and any other part of the generated sequence. If the sequences are too similar, the primer might mishybridize. We originally had a similarity threshold of 0.6 but that made the program run too slowly, so we decided on a '''threshold of 0.8'''.</div>
-</div>
 <div id="628" style="display:none">
 ==June 28th==
@@ Line 1,681: / Line 1,640: @@
 These subsequent steps will be those that were outlined above, namely cutting out the F1 sequence from each oligo, ligating this F1 into our expression plasmid, putting the expression plasmid into our selection strain, observing colonies which get infused with ZFs that bind to our target site (the "hits"), and sequencing the colonies that get hits to determine which ZF they are expressing.
-We will be repeating these exact same steps once we get the chip, so if we can perfect our protocols with these practice sequences, we should be golden when the chip comes in.
+We will be repeating these exact same steps once we get the chip, so if we can perfect our protocols with these practice sequences, we should be golden when the chip comes in.</div>
-</div>
 <div id="629" style="display:none">
 ==June 29th==
@@ Line 1,763: / Line 1,719: @@
 All of this is on our spec-resistance-containing plasmid.  The above construct replaced the GFP which was present previously on this plasmid.
-Tomorrow we will begin design of our primers from these SeqBuilder files.
+Tomorrow we will begin design of our primers from these SeqBuilder files.</div>
-</div>
 <div id="630" style="display:none">
 ==June 30th==
@@ Line 1,827: / Line 1,780: @@
 *Example:
   >1_control
-  GTACATGAAACGATGGACGGGGTCTCAGCCATTCCAATGTCGTATCTGTATGCGTAATTTTTCACGCAAACACCATTTGGGTCGTCATATCCGTACGCACACGGTGAGACCCGCTGAGGAGACTATACCAG
+  GTACATGAAACGATGGACGGGGTCTCAGCCATTCCAATGTCGTATCTGTATGCGTAATTTTTCACGCAAACACCATTTGGGTCGTCATATCCGTACGCACACGGTGAGACCCGCTGAGGAGACTATACCAG</div>
-</div>

Acid	-1	1	2	3	5	6	7
A	77	140	210	197	0	312	85
C	12	24	1	6	14	0	0
D	413	16	694	258	0	142	14
E	125	74	152	107	0	58	132
F	0	0	22	0	10	0	0
G	12	201	328	125	0	177	62
H	93	144	232	652	0	51	17
I	70	21	3	26	0	94	73
K	108	372	46	169	6	321	52
L	176	37	20	22	3325	75	55
M	36	54	5	28	0	31	10
N	23	150	129	940	0	182	61
P	3	298	77	7	0	36	8
Q	813	158	180	13	0	136	30
R	870	539	137	55	3	428	2517
S	99	970	859	278	0	140	12
T	243	134	223	350	0	834	83
V	166	26	27	115	0	341	146
W	19	0	13	0	0	0	0
Y	0	0	0	10	0	0	1

'	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
A	10	0	99	55	0	29	122	20	32	332	2	59	55	63	255	87	24	43	0	0
C	0	0	15	0	0	3	0	0	0	5	0	0	6	0	31	6	14	0	0	0
D	99	15	94	92	0	39	62	6	84	342	15	120	55	42	277	290	87	21	0	8
E	55	0	92	42	0	34	77	1	38	141	2	39	4	29	134	28	90	26	0	1
F	0	0	0	0	0	0	0	10	0	0	0	22	4	0	2	4	6	0	0	0
G	29	3	39	34	0	38	56	0	14	126	1	95	28	47	119	125	38	7	0	0
H	122	0	62	77	0	56	118	9	103	498	4	88	24	26	87	159	70	2	0	0
I	20	0	6	1	10	0	9	6	8	95	3	5	17	3	62	16	17	4	0	0
K	32	0	84	38	0	14	103	8	84	386	24	44	19	102	269	163	113	22	1	0
L	332	5	342	141	0	126	498	95	386	174	32	686	16	112	362	276	875	360	0	8
M	2	0	15	2	0	1	4	3	24	32	0	7	2	11	39	14	3	1	0	0
N	59	0	120	39	22	95	88	5	44	686	7	8	36	28	120	254	84	34	1	0
P	55	6	55	4	4	28	24	17	19	16	2	36	0	3	29	150	21	13	11	0
Q	63	0	42	29	0	47	26	3	102	112	11	28	3	100	261	314	125	19	0	0
R	255	31	277	134	2	119	87	62	269	362	39	120	29	261	618	343	504	281	0	0
S	87	6	290	28	4	125	159	16	163	276	14	254	150	314	343	592	173	91	0	0
T	24	14	87	90	6	38	70	17	113	875	3	84	21	125	504	173	154	28	0	0
V	43	0	21	26	0	7	2	4	22	360	1	34	13	19	281	91	28	12	0	0
W	0	0	0	0	0	0	0	0	1	0	0	1	11	0	0	0	0	0	0	0
Y	0	0	8	1	0	0	0	0	0	8	0	0	0	0	0	0	0	0	0	0

Position	Very Phobic	Hydrophobic	Neutral	Hydrophillic
6	285	85	204	2782
5	542	312	1334	1169
4	3334	14	0	9
3	191	203	1417	1536
2	91	211	1819	1236
1	138	164	1604	1451
-1	468	90	1257	1542

Position	Polar	Nonpolar
6	2917	440
5	2290	1067
4	9	3348
3	2830	527
2	2652	705
1	2555	802
-1	2784	573

'	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
A	10	0	99	55	0	29	122	20	32	332	2	59	55	63	255	87	24	43	0	0
C	0	0	15	0	0	3	0	0	0	5	0	0	6	0	31	6	14	0	0	0
D	99	15	94	92	0	39	62	6	84	342	15	120	55	42	277	290	87	21	0	8
E	55	0	92	42	0	34	77	1	38	141	2	39	4	29	134	28	90	26	0	1
F	0	0	0	0	0	0	0	10	0	0	0	22	4	0	2	4	6	0	0	0
G	29	3	39	34	0	38	56	0	14	126	1	95	28	47	119	125	38	7	0	0
H	122	0	62	77	0	56	118	9	103	498	4	88	24	26	87	159	70	2	0	0
I	20	0	6	1	10	0	9	6	8	95	3	5	17	3	62	16	17	4	0	0
K	32	0	84	38	0	14	103	8	84	386	24	44	19	102	269	163	113	22	1	0
L	332	5	342	141	0	126	498	95	386	174	32	686	16	112	362	276	875	360	0	8
M	2	0	15	2	0	1	4	3	24	32	0	7	2	11	39	14	3	1	0	0
N	59	0	120	39	22	95	88	5	44	686	7	8	36	28	120	254	84	34	1	0
P	55	6	55	4	4	28	24	17	19	16	2	36	0	3	29	150	21	13	11	0
Q	63	0	42	29	0	47	26	3	102	112	11	28	3	100	261	314	125	19	0	0
R	255	31	277	134	2	119	87	62	269	362	39	120	29	261	618	343	504	281	0	0
S	87	6	290	28	4	125	159	16	163	276	14	254	150	314	343	592	173	91	0	0
T	24	14	87	90	6	38	70	17	113	875	3	84	21	125	504	173	154	28	0	0
V	43	0	21	26	0	7	2	4	22	360	1	34	13	19	281	91	28	12	0	0
W	0	0	0	0	0	0	0	0	1	0	0	1	11	0	0	0	0	0	0	0
Y	0	0	8	1	0	0	0	0	0	8	0	0	0	0	0	0	0	0	0	0

Disease	Target Range	Binding Site Location	Bottom Finger	Top Finger	Bottom AA (F3 to F1)	Top AA (F3 to F1)
Colorblindness	chrX:153,403,001-153,407,000	370	GTATTTGTT	GGGCCTGCT	N/A	N/A
Colorblindness	chrX:153,403,001-153,407,000	3627	GCTGGCTGG	GCGGTAATG	EGSGLKR.EAHHLSR.#######	RRDDLTR.QRSSLVR.#######
Cystic Fibrosis	chr7:117,074,084-117,089,556	14767	GCAGGTGAT	AAAGAGCCC	QNGTLGR.EAHHLSR.#######	N/A
Familial Hypercholesterolemia	chr19:11,175,000-11,195,000	14001	GGCTGAGAC	GGAGTCCTG	ESGHLKR.QREHLTT.#######	QTTHLSR.DHSSLKR.#######
Tay-Sachs	chr15:72,674,944-72,688,031	5888	GTCTGGTCA	TCAAACTCC	DRSSLRR.RREHLTI.#######	N/A
Pancreatic Cancer	chr7:117,074,084-117,089,556	1739	GATCAAGCT	GTTTCAGTG	N/A	N/A

PDB ID	Binding Sequence	Link
1F2I	ATGGGCGCGCCCAT	[http://www.pdb.org/pdb/explore/explore.do?structureId=1F2I]
1G2D	GACGCTATAAAAGGAG	[http://www.pdb.org/pdb/explore/explore.do?structureId=1G2D]
1G2F	TCCTTTTATAGCGTCC	[http://www.pdb.org/pdb/explore/explore.do?structureId=1G2F]
1MEY	ATGAGGCAGAACT	[http://www.pdb.org/pdb/explore/explore.do?structureId=1MEY]
1TF6	ACGGGCCTGGTTAGTACCTGGATGGGAGACC	[http://www.pdb.org/pdb/explore/explore.do?structureId=1TF6]
1UBD	AGGGTCTCCATTTTGAAGCG	[http://www.pdb.org/pdb/explore/explore.do?structureId=1UBD]
1TF6	ACGGGCCTGGTTAGTACCTGGATGGGAGACC	[http://www.pdb.org/pdb/explore/explore.do?structureId=1TF6]
1YUI	GCCGAGAGTAC	[http://www.pdb.org/pdb/explore/explore.do?structureId=1YUI]
2DRP	CTAATAAGGATAACGTCCG	[http://www.pdb.org/pdb/explore/explore.do?structureId=2DRP]
2GLI	TTTCGTCTTGGGTGGTCCACG	[http://www.pdb.org/pdb/explore/explore.do?structureId=2GLI]
2I13	CAGATGTAGGGAAAAGCCCGGG	[http://www.pdb.org/pdb/explore/explore.do?structureId=2I13]
2KMK	CATAAATCACTGCCTA	[http://www.pdb.org/pdb/explore/explore.do?structureId=2KMK]
2PRT	CGCGGGGGCGTCTG	[http://www.pdb.org/pdb/explore/explore.do?structureId=2PRT]
2WBS	GAGGCGC	[http://www.pdb.org/pdb/explore/explore.do?structureId=2WBS]
2WBU	GAGGCGTGGC	[http://www.pdb.org/pdb/explore/explore.do?structureId=2WBU]

File:Gnn freqs.png Probability data for the 783 fingers that bind to GNN triplets. Note the high probability of leucine at position 4 and arginine at position 6.	File:Tnn probs.png Probability data for the 128 fingers that bind to TNN triplets. Note the high probability of leucine at position 4.	File:Cnn probs.png Probability data for the 16 fingers that bind to CNN triplets. There may not be enough data to consider this information statistically significant	File:Ann probs.png Probability data for the 29 fingers that bind to ANN triplets. There may not be enough data to consider this information statistically significant
File:Ngn probs.png Probability data for the 298 fingers that bind to NGN triplets. The position 4 leucine motif remains. There is also a high probability (> 0.5) of a histidine at position 3 and an arginine at position 6.	File:Ntn probs.png Probability data for the 177 fingers that bind to NTN triplets. The position 4 leucine motif remains.	File:Ncn probs.png Probability data for the 244 fingers that bind to NCN triplets. The position 4 leucine motif remains. There is also a very high probability of an arginine at position 6.	File:Nan probs.png Probability data for the 248 fingers that bind to NAN triplets. The position 4 leucine motif remains. There is also a very high probability (> 0.75) of an asparagine at position 3 and an arginine at position 6.
File:Nng probs.png Probability data for the 234 fingers that bind to NNG triplets. The position 4 leucine motif remains. There is also a very high probability (> 0.75) of an asparagine at position 1 and a high probability (> 0.5) of an aspartic acid at position 2 and an arginine at position 6.	File:Nnt probs.png Probability data for the 247 fingers that bind to NNT triplets. The position 4 leucine motif remains. There is also a high (> 0.5) probability of an arginine at position 6.	File:Nnc probs.png Probability data for the 262 fingers that bind to NNC triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of an arginine at position 6.	File:Nna probs.png Probability data for the 218 fingers that bind to NNA triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of a glutamine at position -1 and an arginine at position 6.

More Closely Related Backbones		More Distantly Related Backbones
Name	Sequence (with helix)	Name	Sequence (with helix)
44GLAS_DROME	FRCPI---CDRRFSQSSSVTTH-MRTH--	56EGR1_HUMAN	FAC---DICGRKFARSDERKRHTKIH---
38KRUP_DROME	FTCKI---CSRSFGYKHVLQNH-ERTH--	47MZF1_HUMAN	FVC---GDCGQGFVRSARLEEHRRVH---
124EVI1_HUMAN	YRC---KYCDRSFSISSNLQRHVRNIH--	23CF2_DROME	YTC---SYCGKSFTQSNTLKQHTRIH---
6HUNB_DROME	YECK---YCDIFFKDAVLYTIHMGY--H-	19ZEP2_RAT	YICE---ECGIRCKKPSMLKKHIRTH---
16SUHW_DROME	FPCEQ---CDEKFKTEKQLERH-VKTH--	49SDC1_CAEEL	VVC---FHCG-TRCHYTLLHDHLDYCH--
125CF2_DROME	YTC---PYCDKRFTQRSALTVHTTKLH--	27SDC1_CAEEL	LTC---AHCDWSFDNVMKLVRH-RGVH--
43EVI1_HUMAN	FKCHL---CDRCFGQQTNLDRH-LKKH--	130TTKB_DROME	YRC---KVCSRVYTHISNFCRHYVTSH--
118ADR1_YEAST	YPC---GLCNRCFTRRDLLIRHAQKIH--	80ESCA_DROME	YQC---PDCQKSYSTFSGLTKH-QQFH--
24EVI1_HUMAN	QECK---ECDQVFPDLQSLEKHMLS--H-	20IKZF1_MOUSE	HKCG---YCGRSYKQRSSLEEHKERCH--
25SUHW_DROME	MSCKV---CDRVFYRLDNLRSH-LKQH--	127SRYD_DROME	QECTT---CGKVYNSWYQLQKHISEEH--

Disease	Target DNA Finger 2	Target DNA Finger 1	Helices in Zif268 Backbone	Helices in Zif268 Closely-Related Backbones	Helices in Zif268 Distantly-Related Backbones
Colorblindness	TNN	GNN	5150	3000	1000
Colorblindness	GNN	CNN	3050	3050	3050
Familial Hypercholesterolemia	TNN	ANN	3050	3050	3050
Familial Hypercholesterolemia	TNN	CNN	3050	3050	3050
Pancreatic Cancer	GNN	TNN	5150	3000	1000
Pancreatic Cancer	GNN	ANN	3050	3050	3050

File:Open logo NNN.png WebLogo for the OPEN data.	File:Persikov logo NNN.png WebLogo for the Persikov data.
File:IGem logo NNN based on open.png WebLogo for 10000 sequences generated with our program, when it incorporates only OPEN data.	File:IGem logo NNN based on open and persikov.png WebLogo for 10000 sequences generated with our program, when it incorporates both OPEN and Persikov data.
File:Open logo ANN.png WebLogo for fingers that bind to ANN according to OPEN data.	File:IGem logo ANN based on open.png WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates only OPEN data.	File:IGem logo ANN based on open and persikov.png WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates both OPEN and Persikov data.

Target DNA	Cystic Fibrosis	Familial Hypercholesterolemia	Retinal Blastoma	p53	Myc	Pancreatic Cancer
GNN A	Flank 1					?
GNN T	Flank 1
GNN C	?	Flank 2
TNN G		Flank 2			X
TNN C		Flank 3			?
TNN A		Flank 3			?

File:GAA generated round 1.png Round 1 of generating sequences for GAA with the program.	File:GAA generated round 2.png Round 2 of generating sequences for GAA with the program.
File:GAA open and persikov.png GAA sequences from the OPEN dataset.	File:GAA open only.png GAA sequences from Persikov and OPEN datasets.

File:CTC 0.png psu = 0	File:CTC .005 psuedo.png psu = .005	File:CTC .008 psuedo.png psu = .008
File:CTC .01.png psu = .01	File:CTC .015 psuedo.png psu = .015.	File:CTC .02 psuedo.png psu = .020.

Index	Nucleotide Sequence (5\'-3\')	Helices (F3 to F1)	Notes
16	GAA GGG AAC	QDGNLGR RREHLVR HRTNLIA	Very similar to one of our target sequences (CB top), which is GAA GGG ACC
55	GGA GTG GTG	QTTHLSR DHSSLKR RNFILQR	Very similar to a target sequence (FH top), which is GGA GTG CTG
77	TGT GAA TAG	RRRNLQI QQTNLTR QPHGLTA	Out of ze air

'	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
A	10	0	99	55	0	29	122	20	32	332	2	59	55	63	255	87	24	43	0	0
C	0	0	15	0	0	3	0	0	0	5	0	0	6	0	31	6	14	0	0	0
D	99	15	94	92	0	39	62	6	84	342	15	120	55	42	277	290	87	21	0	8
E	55	0	92	42	0	34	77	1	38	141	2	39	4	29	134	28	90	26	0	1
F	0	0	0	0	0	0	0	10	0	0	0	22	4	0	2	4	6	0	0	0
G	29	3	39	34	0	38	56	0	14	126	1	95	28	47	119	125	38	7	0	0
H	122	0	62	77	0	56	118	9	103	498	4	88	24	26	87	159	70	2	0	0
I	20	0	6	1	10	0	9	6	8	95	3	5	17	3	62	16	17	4	0	0
K	32	0	84	38	0	14	103	8	84	386	24	44	19	102	269	163	113	22	1	0
L	332	5	342	141	0	126	498	95	386	174	32	686	16	112	362	276	875	360	0	8
M	2	0	15	2	0	1	4	3	24	32	0	7	2	11	39	14	3	1	0	0
N	59	0	120	39	22	95	88	5	44	686	7	8	36	28	120	254	84	34	1	0
P	55	6	55	4	4	28	24	17	19	16	2	36	0	3	29	150	21	13	11	0
Q	63	0	42	29	0	47	26	3	102	112	11	28	3	100	261	314	125	19	0	0
R	255	31	277	134	2	119	87	62	269	362	39	120	29	261	618	343	504	281	0	0
S	87	6	290	28	4	125	159	16	163	276	14	254	150	314	343	592	173	91	0	0
T	24	14	87	90	6	38	70	17	113	875	3	84	21	125	504	173	154	28	0	0
V	43	0	21	26	0	7	2	4	22	360	1	34	13	19	281	91	28	12	0	0
W	0	0	0	0	0	0	0	0	1	0	0	1	11	0	0	0	0	0	0	0
Y	0	0	8	1	0	0	0	0	0	8	0	0	0	0	0	0	0	0	0	0

Team:Harvard/Template:NotebookData

From 2011.igem.org

Revision as of 14:52, 2 August 2011

Contents

June 6th

June 7th

June 8th

June 9th

June 9th - Bioinformatics

June 10th

June 10th - Bioinformatics

Visualizations

Properties of amino acids

June 13th

Gel images

June 13th - Bioinformatics

June 14

Today's Gel Images

June 14 - Bioinformatics

June 15th

June 15th - Bioinformatics

June 16th

June 16 - Bioinformatics

June 17th

June 17 - Bioinformatics

Goals

Options for Target DNA Sequences / ZF Helices

References

June 20th

June 20th - Bioinformatics

Goals for the week

Today

Probability data

June 21st

June 21st - Bioinformatics

Persikov Statistics - Graphs

Phone Call with Dan

Chip Design

Identifying dependencies

June 22nd

June 22nd - Bioinformatics

Final target sequences

Finalizing the non-Zif268 backbones

Updated Chip Design

Finishing the generator

June 23rd

June 23rd - Bioinformatics

Revising Target Sequences

June 24

Updated Closest Zif268 Fingers

June 24th - Bioinformatics

Sequence Generation

June 24th

June 24th - Bioinformatics

Playing with Pseudocounts

June 25th-26th - Bioinformatics

June 27, Wet lab

June 27th - Bioinformatics

To Do for Today

100 Control Sequences

Updated Target Sequences

Cut Site Design

Updates on the program

June 28th

June 28th - Bioinformatics

Plasmid and Oligo Design Schematics

Chip-Based Sequence Design Schematic

References

Harvard Logo

Running the Generator!

Generated Final Chip Sequences

Generated WebLogos for Final Chip

Bioinformatics Candids

Design of Plate Practice Sequences

June 29th

PCR Preparation

PCRs

Expression Plasmid Design in silico

June 30th

ZF Expression Plasmid Ultramer and Primer Design

'	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
A	10	0	99	55	0	29	122	20	32	332	2	59	55	63	255	87	24	43	0	0
C	0	0	15	0	0	3	0	0	0	5	0	0	6	0	31	6	14	0	0	0
D	99	15	94	92	0	39	62	6	84	342	15	120	55	42	277	290	87	21	0	8
E	55	0	92	42	0	34	77	1	38	141	2	39	4	29	134	28	90	26	0	1
F	0	0	0	0	0	0	0	10	0	0	0	22	4	0	2	4	6	0	0	0
G	29	3	39	34	0	38	56	0	14	126	1	95	28	47	119	125	38	7	0	0
H	122	0	62	77	0	56	118	9	103	498	4	88	24	26	87	159	70	2	0	0
I	20	0	6	1	10	0	9	6	8	95	3	5	17	3	62	16	17	4	0	0
K	32	0	84	38	0	14	103	8	84	386	24	44	19	102	269	163	113	22	1	0
L	332	5	342	141	0	126	498	95	386	174	32	686	16	112	362	276	875	360	0	8
M	2	0	15	2	0	1	4	3	24	32	0	7	2	11	39	14	3	1	0	0
N	59	0	120	39	22	95	88	5	44	686	7	8	36	28	120	254	84	34	1	0
P	55	6	55	4	4	28	24	17	19	16	2	36	0	3	29	150	21	13	11	0
Q	63	0	42	29	0	47	26	3	102	112	11	28	3	100	261	314	125	19	0	0
R	255	31	277	134	2	119	87	62	269	362	39	120	29	261	618	343	504	281	0	0
S	87	6	290	28	4	125	159	16	163	276	14	254	150	314	343	592	173	91	0	0
T	24	14	87	90	6	38	70	17	113	875	3	84	21	125	504	173	154	28	0	0
V	43	0	21	26	0	7	2	4	22	360	1	34	13	19	281	91	28	12	0	0
W	0	0	0	0	0	0	0	0	1	0	0	1	11	0	0	0	0	0	0	0
Y	0	0	8	1	0	0	0	0	0	8	0	0	0	0	0	0	0	0	0	0