Team:Harvard/Template:NotebookData

From 2011.igem.org

(Difference between revisions)

Latest revision as of 14:52, 12 August 2011

June 9th - Wet Lab

Created cell culture with selection construct (contains ZFB, His3, pyrF on plasmid) and reporter RFP (this will be used to test positive control ZFs, cells fluoresce green when ZF binds)
- Picked colonies, grew in LB/amp liquid media until mid-log
  - 3 mL of LB, 1.5 µL of 2000x amp
- Once mid-log reached, created glycerol stock, stored stock at -80⁰C.
  - ~~300 µL bacteria, 1200 µL 80% glycerol~~ This should have been 1200 µL bacteria media, 300 µL 80% glycerol (Corrected 6/14/2011) (80% pure glycerol, 20% molecular grade water)
- Spiked new tubes of media with 25 µL bacteria from the mid-log tube to leave overnight

NOTE: reporter RFP did not grow to mid-log by end of day, will let grow overnight to saturation and continue creating glycerol stock tomorrow.

Plated selection strain from gel stab onto tet plate.

Began primer design for creating the kan/selection construct fusion.

June 9th - Bioinformatics

Today we focused on reacquainting and familiarizing ourselves with Python. We completed the parsing (reading in) of the sequence and amino acid data so that it is easy to work with: by substituting each amino acid abbreviation (ex. A, N) with its numeric equivalent (ex. 1, 14), we can use a lot of nice math comparisons instead of messy letter/"string" comparisons.

After that, we worked on counting the number of times each amino acid appears in each of the 7 positions (unfortunately given by -1,1,2,3,5,6,7), and counting the number of times amino acids are next to each other (ex. ACTQRNF has AC, CT, TQ, etc pairings). Taken overall, we found that L is overwhelmingly in position 5.

Acid	-1	1	2	3	5	6	7
A	77	140	210	197	0	312	85
C	12	24	1	6	14	0	0
D	413	16	694	258	0	142	14
E	125	74	152	107	0	58	132
F	0	0	22	0	10	0	0
G	12	201	328	125	0	177	62
H	93	144	232	652	0	51	17
I	70	21	3	26	0	94	73
K	108	372	46	169	6	321	52
L	176	37	20	22	3325	75	55
M	36	54	5	28	0	31	10
N	23	150	129	940	0	182	61
P	3	298	77	7	0	36	8
Q	813	158	180	13	0	136	30
R	870	539	137	55	3	428	2517
S	99	970	859	278	0	140	12
T	243	134	223	350	0	834	83
V	166	26	27	115	0	341	146
W	19	0	13	0	0	0	0
Y	0	0	0	10	0	0	1

For pairings, we found patterns, but none as obvious as the L-in-position-5. Read this like a multiplication table: the intersection of L row and M column is how often that pairing was observed.

'	A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
A	10	0	99	55	0	29	122	20	32	332	2	59	55	63	255	87	24	43	0	0
C	0	0	15	0	0	3	0	0	0	5	0	0	6	0	31	6	14	0	0	0
D	99	15	94	92	0	39	62	6	84	342	15	120	55	42	277	290	87	21	0	8
E	55	0	92	42	0	34	77	1	38	141	2	39	4	29	134	28	90	26	0	1
F	0	0	0	0	0	0	0	10	0	0	0	22	4	0	2	4	6	0	0	0
G	29	3	39	34	0	38	56	0	14	126	1	95	28	47	119	125	38	7	0	0
H	122	0	62	77	0	56	118	9	103	498	4	88	24	26	87	159	70	2	0	0
I	20	0	6	1	10	0	9	6	8	95	3	5	17	3	62	16	17	4	0	0
K	32	0	84	38	0	14	103	8	84	386	24	44	19	102	269	163	113	22	1	0
L	332	5	342	141	0	126	498	95	386	174	32	686	16	112	362	276	875	360	0	8
M	2	0	15	2	0	1	4	3	24	32	0	7	2	11	39	14	3	1	0	0
N	59	0	120	39	22	95	88	5	44	686	7	8	36	28	120	254	84	34	1	0
P	55	6	55	4	4	28	24	17	19	16	2	36	0	3	29	150	21	13	11	0
Q	63	0	42	29	0	47	26	3	102	112	11	28	3	100	261	314	125	19	0	0
R	255	31	277	134	2	119	87	62	269	362	39	120	29	261	618	343	504	281	0	0
S	87	6	290	28	4	125	159	16	163	276	14	254	150	314	343	592	173	91	0	0
T	24	14	87	90	6	38	70	17	113	875	3	84	21	125	504	173	154	28	0	0
V	43	0	21	26	0	7	2	4	22	360	1	34	13	19	281	91	28	12	0	0
W	0	0	0	0	0	0	0	0	1	0	0	1	11	0	0	0	0	0	0	0
Y	0	0	8	1	0	0	0	0	0	8	0	0	0	0	0	0	0	0	0	0

Follow up work on this will be to convert this table to frequencies instead of values: values are less meaningful.

June 10th - Wet Lab

What we learned today: don't put E. coli plates in the -20C freezer!

Observed a well populated selection strain plate and placed it in the 4C refrigerator

Took the selection construct culture and extracted the plasmid through miniprep
- Observed 260/280 ratio of 1.90 and 1.88 through Nanodrop
- Observed concentrations of 87.7 and 100.6 ng/µL through Nanodrop

Made 10 new agar plates with LB and amp

June 10th - Bioinformatics

Visualizations

We spent the first few hours today making cool visualizations and graphs of the data we found on the 9th: heatmaps turned out to be an annoying limitation of Python, so a Python/R hybrid was used, and bar charts were made exclusively in Python. See the dropbox for our pretty (and hopefully informative compared to spreadsheets) charts/graphs.

A heatmap of the pairing data. The darker the blues indicate that the pairing occurs more often.

We then started work on TNN and GNN properties specifically (essentially repeating the June 9th work, but confined to smaller data sets). There are some differences between TNN and GNN: see graphs in dropbox. We decided that there was not enough data for fingers that bind to ANN and CNN triplets to perform significant analysis on it.

A heatmap of the GNN pairing data.

A heatmap of the TNN pairing data.

Overall, similar color clusters are found in the heatmaps. In all cases, L and N are often placed consecutively on the helix. There are fewer clusters of high frequency when looking at TNN binders.

We then, using the theorized framework from a paper (2011 Persikov [http://iopscience.iop.org.ezp-prod1.hul.harvard.edu/1478-3975/8/3/035010/]), tried to match amino acid binding to each base pair to see if there was a pattern. See dropbox document .../bioinformatics/Binding Frequency for that data. There's a lot of it.

Properties of amino acids

We then worked on finding properties of the each position (hydrophobic/phillic, non/polar):

Hydrophilic vs Hydrophobic

Position	Very Phobic	Hydrophobic	Neutral	Hydrophillic
6	285	85	204	2782
5	542	312	1334	1169
4	3334	14	0	9
3	191	203	1417	1536
2	91	211	1819	1236
1	138	164	1604	1451
-1	468	90	1257	1542

Polar vs Nonpolar

Position	Polar	Nonpolar
6	2917	440
5	2290	1067
4	9	3348
3	2830	527
2	2652	705
1	2555	802
-1	2784	573

Follow up work here is to check more properties, and maybe try individual pairings (ex. phobic-philic, polar-phillic).

June 13th - Wet Lab

The control zinc fingers OZ052 and OZ123 were amplified with overhanging primers to allow its insertion into the Wolfe plasmid:

Overhang PCR for ultramers: the template was the product of the ultramer PCR (see 6/8/11), and several concentrations were used

In all the tubes:

5 µL Pfx amplification buffer
1.5 µL dNTPs
1 µL MgSO4
0.4 µL polymerase
38.1 µL ddH2O
1.5 µL OZ052_up and 1.5 µL OZ052_down OR 1.5 µL OZ123_up and 1.5 µL OZ123_down

In OZ052 (1) and OZ123 (1):

1 µL of ultramer PCR product

In OZ052 (1:10) and OZ123 (1:10):

1 µL of a 1 in 10 dilution of ultramer PCR product

In OZ052 (1:100) and OZ123 (1:100):

1 µL of a 1 in 100 dilution of ultramer PCR product

Parameters:

94⁰C for 5 min
94⁰C for 15 sec
55⁰C for 30 sec
68⁰C for 30 sec
Repeat steps 2-4 for 25 cycles
68⁰C for 5 min
4⁰C forever

Gel to verify proper amplification (1% agarose gel, 10 µL 1 kb ladder, 120 V):

The OZ052 lanes (1-3) had bands at the proper length (328 bp) at all three concentrations, although there were several fainter bands likely from side products. Only the undiluted OZ123 lane showed any bands, and from the faint band at 328 and the stronger band around 250 it appears that the PCR did not work well, and the majority of the product was the ultramer from the first PCR.

PCR around vector: the template used was the Wolfe selection construct plasmid miniprepped 6/10/11 (100.6 ng/µL stock) Reagents the same as above except:

1.5 µL of Wolfe_F and 1.5 µL of Wolfe_R primers to each tube
plasmid tube (1 ng) given 1 ng of template (1 µL of a 1 in 100 dilution)
plasmid tube (10 ng) given 10 ng of template (1 µL of a 1 in 10 dilution)

Parameters same as above except:

elongation (step 4) 5 minutes (vector approximately 5 kb)

Gel to verify proper amplification (1% agarose, 10 µL 1 kb ladder, 170V)

There were no bands of the correct size in the lanes. The only band that appeared was a faint, short band in one lane that likely was a primer. Since the DNA ladder worked, the problem likely was not with the electrophoresis but with the PCR reaction, perhaps due to issues with the primers.

Gel images

Ultramer Overhang 6/13/11

Backbone plasmid 6/13/11

June 13th - Bioinformatics

Today we started work on a program to statistically generate possible sequences.

The four functions needed to do this are:

generate(matrix, pseudocounts (lambda), dependency tuples)

takes a matrix of zinc-finger AA position counts, a list of dependent amino acid pairs, and a pseudocount multiplier and generates a list of potential amino acid sequences weighted by independent and dependent probabilities

add_pseudo(dependent matrix row,independent matrix row)

given a matrix row of dependent counts (i.e. how many times 'a' occurs at position n when 'b' is set to some AA at position m) and a row of independent matrix counts (how many times 'a' occurs at n regardless of b's AA) return an adjusted matrix row, based on the dependent matrix row, that has pseudocounts added to the values that are empty in the dependent matrix row but filled in the independent matrix row.

generate_indep(matrix)

randomly pick an amino acid, given a matrix row, from a weighted random distribution based on the values in the row

generate_dep(indep_row, dep_row, lambda)

add pseudo counts (call add_pseudo) and generate a dependent random call for a position (using generate_indep on the adjusted matrix)

We finished generate_indep, generate_dep, and add_pseudo today, along with creating a 140x140 matrix of needed values.