Team:Harvard/Template:NotebookData3

From 2011.igem.org

(Difference between revisions)

Revision as of 23:50, 2 August 2011

June 21st

His3 sequencing results:

The sequencing results showed that the His3 (HisB) gene is still present in the strain and without any early stop codons. There is a 2 aa deletion in the middle of the protein, but its purpose is unknown and the gene likely is still fully functional.

Restreak selection strain on plate from glycerol stock--tomorrow we will PCR the His3 locus and sequence again just to be sure.
Made oligos for MAGE to insert stop codons and make a frame shift in the endogenous His3 gene, so that if necessary we can knockout His3 ourselves.

Selection strain with lambda red:

Reinoculated and made glycerol stock
prepared for MAGE tomorrow

June 21st - Bioinformatics

Persikov Statistics - Graphs

File:Scatterplot of top bottom 20 with SVM polynomial.png

Scatterplot of top/bottom 20 with SVM polynomial

File:Sequence by sequence (lin SVM).png

Sequence by sequence (lin SVM)

File:Top Bottom 20 ZFs (SVM linear).png

Top/Bottom 20 ZFs (SVM linear)

File:Comparison of polynomial vs linear distribution (polynomial generally higher values).png

Comparison of polynomial vs. linear distribution (polynomial generally higher values)

FQCRICMRNFS_{zif268 F2 Backbone}/Helix F1/TGEKP_linker

The Persikov data shows weak predictive power for OPEN amino acid sequences. Our conclusion is that Persikov's program is not well-suited for incorporation into our helix generator. Testing Persikov's helices in his program yeilded mostly accurate results (approximately 24/25 matched known binding information). This is an important test because it proved that we are using the program correctly and that the program is in fact working properly. However, testing the OPEN sequences in Persikov's program resulted in numerous false negative values which informed our decision not to use Persikov's program to check our own hellix-generating program.

Phone Call with Dan

How conservative/risky should we be in terms of using other backbones?
- Conservative
  - Possible Pros:
    - More likely to get something that will work
    - Depending on how "smart" our probabilities are (from our ZF generation algorithm), we could cover a lot of novel space without straying too far from zif268
    - Worst Case:Something we can show for iGEM (we covered the same ground OPEN did, and found many of the same ZFs, but with a targeted approach, a "smarter" method-- not throwing random things at it; Chip is not ours, but the program is "smarter")
  - Possible Cons:
    - Might end up covering the same ground as OPEN, but doing a "worse" job than they did
    - Less likely to discover new/groundbreaking things (i.e., TNN triplets)
- Less Conservative
  - Have 3-6 target sequences (we're currently going for 8)
  - More backbones from non-zif268 than zif268
  - Pros:
    - We could get luck and find something no one has ever seen before (TNN, ANN). If we throw enough things at it, we're more likely to get luck.
  - Cons:
    - Risk: Many of these backbones (from entire ZF world)may NOT bind DNA (i.e., may bind proteins)
    - Risk: May not find anything that binds, then the whole project is a dud
What is the more important variable, helices or backbones?
- Helices seem to be more important, backbones of secondary importance
- Backbones: ZF's unravel DNA, open the major groove-- backbone is important here, changes the bond angle, etc. (Brandon's paper-??)
Balance needed between low and high risk
- If we find backbones that we know bind DNA, greatly lowers our risk
- Limited spaces on chip: zero-sum game
- With a middle of the road approach, we diminish both benefits and risk (diminishes the benefits of the high risk approach much more than it diminishes the benefits of the conservative approach; i.e., if you're playing the lottery, you're more likely to win if you buy many more tickets)
We need to compare probabilities of randomly-generated OPEN sequences vs. probabilities of sequences randomly generated by our program
- OPEN tries to cover all space: smaller probability
- If we have a "smarter" algorithm, we can produce fewer
- However, the idea is not to repeat OPEN, but to go somewhere else, non-GNN sequences
- Remember: OPEN is a Cell paper; the point of the project is not to compare ourselves to them
If we find binders for 1-2 of our sequences, that would be awesome
- Probably we'll have some that find none, some have 10, our last one might have 1,000 hits (then, we do bioinformatics to figure out why/what those hits were)
- Point: to learn and do high-level bioinformatics, and high-tech cloning techniques in the lab
- If you do find binders, you can write a paper about it!
We have all the resources we need right now to build our chip
- We need to pick out targets
- Need to decide exactly what we want for:
  - No. of target sequences/which ones
  - No. of helices/ which ones
  - Ratio of zif268 backbones: non-zif268 backbones
- Avoid switching Leucine out of position 4, then change other positions based on our frequencies

Chip Design

No. of sequences will be more than we can put on the chip
- Helices: essentially unlimited
  - Put more-likely-to-bind helices into the risky backbones
  - Put less-likely-to-bind helices into a zif268 backbone
Backbones
- Maybe revert to a more targeted approach: pick backbones that we know are transcription factors (TFs), that we know bind to DNA
- OR research the ZFs from the phylogenetic tree
  - Pick clades to research, see if one looks better than the other
- Why did OPEN cover so many helices, without changing the backbone, but still yield predominantly GNNs?
- If we have an idea of how the backbone might affect binding, maybe we could look into some sort of low-level modeling, etc. so that we wouldn't be grasping? Could Vatsan help with this?
  - See 2000 Wolfe paper [http://www.ncbi.nlm.nih.gov/pubmed/10940247]
- Backbones could affect interactions between fingers
- Theory: energy penalty to ZF binding-- unravels DNA when binds to it
We have 12 target sequences
- 2 per 4 diseases, 4 for the 5th disease
- If we want to be more conservative, we could throw out Type III, but it could be something cool
- We should have mostly Type I (CoDA argument, if this is an F2)
- Proposed: 3 diseases, 6 sequences
  - 4 Type I (F3 and F2 known, F1 novel)
  - 1 Type II (GNN, ANN, GNN)
  - 1 Type III (All unknown, e.g., TNN, ANN, TNN;max 1)

Or, for 3 diseases:

Type I's
Type I, Type II
Type I, Type III

Clinical Targets

Colorblindness (Type I's)
Familial Hypercholesterolemia (FH) (1 in 500)
~~Cystic Fibrosis (CF)~~
~~Tay Sachs~~
KRAS- (oncogene/cancer)

Main goal of project: to build outside of what is already known
- If we wanted to cure a disease only, we could just use existing ZFs (i.e., find GNN binding locations)
- Also, we lend a level of specificity for insertion/deletion
- There is the possibility that there might be some area where specificity might demand ANN codons

Current decision on chip design:

We will have 6 target sequences, 2 each from colorblindness, FH, and KRAS. All are "Type I" targets (only F1 is novel) with the middle finger chosen from the CODA paper (either GNN or TNN)
- N.B.: the CB and FH sequences make up full ZF nuclease cut sites. The KRAS sites, due to the small number of GNNTNN F3F2 combos available in CODA, are separate, with the flanking ZF nuclease site added afterwards in parentheses

GGTGGTAAG (CB)
GGAGTCCTG (FH)
GGCTGATGC (KRAS) (CTGAAAATT)
GGCTGACAC (FH)
GGCTGGAAT (KRAS) (GACAAGAGC)
GTCGCCTCC (CB)

Targets 3, 4, and 6 are similar to sequences Zif268 variants successfully bind to, so the backbones will be weighted accordingly:
- Zif268_F2 backbone: 6000 helices (per target)
- 10 backbones more closely related to Zif268: 300 helices each
Targets 1, 2, and 5 will have equal distributions of backbones:
- Zif268_F2: 3000 helices
- 10 backbones closely related to Zif268: 300 each
- 10 backbones more distantly related to Zif268: 300 each

Identifying dependencies

We looked at the probability graphs to determine which amino acid positions on the finger's helix interact with which bases.
- Some interactions are fairly well estabilished, while others have been more recently proposed (See interaction map (Persikov 2011))
- To identify these interactions in our own data we looked at which helix positions varied most when you changed the bases. A more rigorous way to do this is to calculate the entropy change as you change the amino acids in each position.
  - xNN(Vary base 1): Amino acid 6 changes
  - NxN(Vary base 2): Amino acid 3 changes
  - NNx(Vary base 3): Amino acid -1 and 2(?) changes
- Our program looks at dependencies between amino acids when generating sequences.
  - We decided on these amino acid dependencies, using both established data and patterns we saw in the OPEN data:
    - -1 and 2
    - 2 and 1
    - 6 and 5
- Because there is not much data for 'CNN' and 'ANN' sequences (with 16 and 29 known fingers that bind to each triplet, respectively), we should use pseudocounts for these sequences, so that our frequency generator is not too biased toward probabilities that may not be significant.

June 22nd

Preparing media/reagents for selection system:

Made 0.1M zinc chloride solution, M9 salt, and 1M magnesium sulphate solutions for the amino acid mixture
- M9 Salt solution (20x)
  - 67.8 g of disodium phosphate
  - 30 g of monopotassium phosphate
  - 5 g of sodium chloride
  - 10 g of ammonium chloride
  - All in 500 mL of distilled water
  - Sterile filtered when done dissolving

Overhang PCR for 3-part assembly of ZFs, omega subunit, and backbone vector (pZE21G, spec resistance)

Clone out omega+Zif268:
- template: original selection construct plasmid (ZFB, his3, etc.)
- primers: omega_F+homolog, Zif268_R+homolog
- Protocol
  - 98 C for 30 sec
  - 98 C for 10 sec
  - 68 C for 30 sec
  - 72 C for 30 sec
  - Repeat steps 2-4, 30 times
  - 4 C for ever
Clone out omega only:
- template: original selection construct
- primers: omega_F+homolog, omega_R
- Protocol
  - Same as omega + Zif268
Clone OZ052 with overhang:
- template: OZ052 overhang (overhang currently matches selection construct), 1:10b, 15ng/µL
- primers: OZ052_F+omega homolog, OZ052_R+homolog
- Protocol
  - Same as omega + Zif268
Clone OZ123 with overhang:
- template: OZ123 overhang (overhang currently matches selection construct), 1, 6.3ng/µL
- primers: OZ123_F+omega homolog, OZ123_R+homlog
- Protocol
  - Same as omega + Zif268
clone out pZE21G backbone
- template: pZE21G containing cells from plate, diluted 1:10
- primers: back_F, back_R
- Protocol
  - 98 C for 30 sec
  - 98 C for 10 sec
  - 68 C for 30 sec
  - 72 C for 1:30
  - Repeat steps 2-4, 30 times
  - 72 C for 5 min
  - 4 C for ever
25µL reaction:
- 12.5µL Phusion mastermix
- 1.25µL each primer
- 1µL of 1ng/µL dilution of template
- 9µL of ddH2O
Ran gel of the above PCR products and imaged below: only the omega and omega+Zif268 reactions seemed to work

File:2011.06.22.ZFBsBackbone&omega(labeled)

PCR for 3-part ZF assembly

pZE21G backbone:

1 colony of pZE21G grown in 3mL LB, 3µL spectinomycin (1000x) until mid-log. Glycerol stock made.
Miniprep of pZE21G in order to PCR the backbone: used Qiagen kit
- 5ng/µL and didn't seem pure: miniprep (or nanodrop) not working
- another colony used to start an overnight culture for PCR/glycerol stocks
Ran gradient PCR on the miniprep product in order to obtain backbone
- same protocol as before, with 1µL of template
- 6 tubes spaced so that annealing temp=60,62,64,66,68,70
Parameters: (program on PCR5, IGEM-> DOGGED)
- 98°C for 30s (initial denaturation)
- 98°C for 10s (denature)
- 60°C to 71°C for 15s (anneal)
- 72°C for 90s (extend)
- Repeat steps 2-4 for 30 cycles total (denature, anneal, extend)
- 72°C for 5 min
4°C forever

HisB locus PCR: We repeated the PCR of the selection strain at this locus just to be sure HisB is still present

grew 1 colony from a new selection strain plate Vatsan brought in 0.5mL LB, 0.5µL tet
used 1 µL of bacterial suspension for PCR following same procedure as 6/20

Lambda red to make selection system:

grew ?His3?PyrF?rpoZ+pKD46 to mid-log (0.4 using OD)
induced lambda red by shaking culture in 42C water bath for 15 min
spin down 1mL for 1 min, 18000 rcf at 4C
wash 2x with cold water, removing as much supernatant as possible
resuspend with 200 ng kan-ZFB-wp-his-ura template (20µL) and water up to 50µL (30µL)
electroporate using 1mm gap cuvettes adn 1.80KV. Immediately afterward add 1mL LB to cuvette, mix, and transfer to culture tube containing 2 mL more of LB
recover for 2hrs, 30C
spread on kanamycin plates: 100µL, 10µL, or 1µL (the last two dilute with 100µL LB to help spread more easily)
grow overnight at 30C

June 22nd - Bioinformatics

Final target sequences

Our "tentatively" Final DNA Target Sequences (i.e. barring any major objections, we're going with this):

Disease	Target Range	Binding Site Location	Bottom Finger	Top Finger	Bottom AA (F3 to F1)	Top AA (F3 to F1)
Colorblindness	chrX:153,402,679-153,408,753	256	GGC TGA GGC	GTA GCT GGG	ESGHLKR.QREHLTT.#######	QSGTLTR.QRSDLTR.KKDHLHR
Colorblindness	chrX:153,402,679-153,408,753	2067	GAA GGG GAC	GGG GCT CAC	QDGNLGR.RREHLVR.EEANLRR	RTEHLAR.QRSDLTR.#######
Familial Hypercholesterolemia	chr19:11,175,000-11,195,000	2707	GGC TGG ATG	GGC TGG CTC*	ESKHLTR.RREHLTI.#######	ESKHLTR.RREHLTI.#######
Pancreatic Cancer	chr7:117,074,084-117,089,556	4423	GCA GAC TGT	GCA GGA AAA	QGNTLTR.DRGNLTR.#######	QDVSLVR.QSAHLKR.#######

Drier was unable to find a ZF that bound specifically to CTC. Instead he found zinc fingers that bound to CTC and other sequences with equal binding affinity.
Note: The green cells are the target sequences that we are aiming for on our chip.

Finalizing the non-Zif268 backbones

In addition, we locked down the non-Zif268 backbones that we will be using for the chip. We have 10 backbones that are more closely related to Zif268, and 10 that are more distantly related:

More Closely Related Backbones		More Distantly Related Backbones
Name	Sequence (with helix)	Name	Sequence (with helix)
44GLAS_DROME	FRCPI---CDRRFSQSSSVTTH-MRTH--	56EGR1_HUMAN	FAC---DICGRKFARSDERKRHTKIH---
38KRUP_DROME	FTCKI---CSRSFGYKHVLQNH-ERTH--	47MZF1_HUMAN	FVC---GDCGQGFVRSARLEEHRRVH---
124EVI1_HUMAN	YRC---KYCDRSFSISSNLQRHVRNIH--	23CF2_DROME	YTC---SYCGKSFTQSNTLKQHTRIH---
6HUNB_DROME	YECK---YCDIFFKDAVLYTIHMGY--H-	19ZEP2_RAT	YICE---ECGIRCKKPSMLKKHIRTH---
16SUHW_DROME	FPCEQ---CDEKFKTEKQLERH-VKTH--	49SDC1_CAEEL	VVC---FHCG-TRCHYTLLHDHLDYCH--
125CF2_DROME	YTC---PYCDKRFTQRSALTVHTTKLH--	27SDC1_CAEEL	LTC---AHCDWSFDNVMKLVRH-RGVH--
43EVI1_HUMAN	FKCHL---CDRCFGQQTNLDRH-LKKH--	130TTKB_DROME	YRC---KVCSRVYTHISNFCRHYVTSH--
118ADR1_YEAST	YPC---GLCNRCFTRRDLLIRHAQKIH--	80ESCA_DROME	YQC---PDCQKSYSTFSGLTKH-QQFH--
24EVI1_HUMAN	QECK---ECDQVFPDLQSLEKHMLS--H-	20IKZF1_MOUSE	HKCG---YCGRSYKQRSSLEEHKERCH--
25SUHW_DROME	MSCKV---CDRVFYRLDNLRSH-LKQH--	127SRYD_DROME	QECTT---CGKVYNSWYQLQKHISEEH--

Updated Chip Design

The CODA article produced zinc fingers that bound a GNN or TNN F2 with either a ANN, GNN, or TNN F3. These results lead us to the following distribution of three types of zinc finger backbones (Zif268, similar but not equal to Zif268, and dissimilar to Zif268) across our 6 target DNA sequences. With 55,000 spaces on our chip, each of the 6 target DNA sequences is allotted 9,150 spaces with 100 spaces set aside for control zinc fingers from CODA and OPEN. Note that the values in the table below represent the number of helices inserted into each type of backbone.

Disease	Target DNA Finger 2	Target DNA Finger 1	Helices in Zif268 Backbone	Helices in Zif268 Closely-Related Backbones	Helices in Zif268 Distantly-Related Backbones
Colorblindness	TNN	GNN	5150	3000	1000
Colorblindness	GNN	CNN	3050	3050	3050
Familial Hypercholesterolemia	TNN	ANN	3050	3050	3050
Familial Hypercholesterolemia	TNN	CNN	3050	3050	3050
Pancreatic Cancer	GNN	TNN	5150	3000	1000
Pancreatic Cancer	GNN	ANN	3050	3050	3050

N.B.: The chip will only be holding our F1 zinc fingers- the F2 and F3 will be on a separate plasmid that we must make ourselves

To Do: The distribution of helices to each backbone set/target sequence needs to be finalized. For example, the program can generate a set of helices for the Zif268 backbone to be applied to the colorblindness target sequence, but should the same set or a completely different helix set be applied to the Zif268 backbones for the familial hypercholesterolemia target sequences?

If we want to test the effect of the backbone, would need to keep the helices constant-- but we could do this within a single target, to keep all other variables constant

Finishing the generator

We finished and finalized the program that generates zinc finger sequences. The following changes were made today:

We incorporated the data from Persikov's database into out generator.
We included the ability to remove duplicate sequences in the output file (with a dictionary).
We added pseudocounts for fingers that bind to 'ANN' or 'CNN' targets. Because there is not much information for these targets, the data we do have may be biased. Thus, we want to make sure that amino acids that currently have no probability of occurring are bumped up to a minimum (currently 0.01).
We placed additional weight on non zif-268 backbones. Formerly, the amino acids for positions 1, 4, and 5 were fixed based on zif-268 data, regardless of the original helix sequences on these backbones. Now information from both zif-268 and the original helix sequence is considered when assigning weights to the amino acids.
We started working with Noah's reverse translate program.

We tested our generator to ensure that the sequences it was producing appeared to be legitimate.

Jamie looked at the multiple sequence alignments of the fingers generated, so see if the frequencies correlated with what we expected them to be. (?)
We input a known DNA triplet to see if the program generated sequences known to bind, according to OPEN data. When generating 10000 sequences, about half the known binders for the input triplet were found.

We created [http://weblogo.berkeley.edu/ WebLogos] to more easily visualize how adding the Persikov data affects the sequences we generate. The size of the letters correspond to the frequency of that amino acid in that position. We decided to incorporate the Persikov data so that our generator incorporates more information when generating sequences. Doing so does not drastically change the sequences generated.

File:Open logo NNN.png WebLogo for the OPEN data.	File:Persikov logo NNN.png WebLogo for the Persikov data.
File:IGem logo NNN based on open.png WebLogo for 10000 sequences generated with our program, when it incorporates only OPEN data.	File:IGem logo NNN based on open and persikov.png WebLogo for 10000 sequences generated with our program, when it incorporates both OPEN and Persikov data.
File:Open logo ANN.png WebLogo for fingers that bind to ANN according to OPEN data.	File:IGem logo ANN based on open.png WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates only OPEN data.	File:IGem logo ANN based on open and persikov.png WebLogo for 10000 sequences generated for an ANN triplet with our program, when it incorporates both OPEN and Persikov data.

June 23rd

Ran gel to determine the results of the PCR products

Determined the HisB presence in selection strain
- Finalized the presence of hisB through the gel image below
Determined the success of the pZE21G backbone primers through gel on gradient PCR
- PCR for the backbone failed again, even done through gradient PCR

File:2011.06.23.hisB&pZE21Gbackbone(labeled)

PCR of HisB locus

PCR more Kan-ZFB-His3-Ura3

Two 50 µL reactions
- Doubled the protocol used on the 16th and used HisUraKan_F and ZFBwpHisUra_R primers
- PCR produced very low concentration of Kan-ZFG-His3-Ura3 because melting temperature of primer was too high, so primers stuck to annealing DNA and did not dissociate
  - In image below the low concentration of desired product can be seen, along with the high concentration of unused primers

File:2011.06.23.KanZFBHis3Ura3(labeled)

PCR of kan-ZFB-wp-his3-ura3

Ran gel of initial overlap PCR product (undiluted, starting without primers) from June 16th, using whole sample
Performed gel extraction in order to have more Kan-ZFB-His3-Ura3 product for the transformation tomorrow: 8.5ng/µL, 260/280=2.03

Determined success of the selection construct transformation

Checked the plates all day,and finally came to the conclusion that the transformation did not work
Discovered that lambda red has a promoter induced by arabinose, not temperature (though the strain is still temperature sensitive). That is why it didn't work--we'll get arabinose and hopefully have a successful recombination.
Preparing all parts for transformation today and will finish it tomorrow

Oligo Design for MAGE

Designed 90bp long oligos for OZ052 and OZ123 insertion in the ZFB sites in place of Zif268. Reverse complement taken.

Miniprep pZE21G plasmid for backbone PCR

Ran miniprep again for pZE21G plasmid: 6ng/µL, 260/280=2
Worried that the miniprep didn't work: ran gel also on this miniprep and concluded that DNA was present in the sample
- Gel image seen below

File:2011.06.23.pZE21Gminiprep(labeled)

pZE21G miniprep

Ran PCR for pZE21G backbone

Used same protocol from June 22 in PCR today for pZE21G backbone

June 23rd - Bioinformatics

Revising Target Sequences

Target DNA	Cystic Fibrosis	Familial Hypercholesterolemia	Retinal Blastoma	p53	Myc	Pancreatic Cancer
GNN A	Flank 1					?
GNN T	Flank 1
GNN C	?	Flank 2
TNN G		Flank 2			X
TNN C		Flank 3			?
TNN A		Flank 3			?

This is the set of final, final target sequences based on the table above:

Disease	Target Range	Binding Site Location	Bottom Finger	Top Finger	Bottom AA (F3 to F1)	Top AA (F3 to F1)
Colorblindness	chrX:153,403,001-153,407,000	3627	GCT GGC TGG	GCG GTA ATG	EGSGLKR.EAHHLSR.#######	RRDDLTR.QRSSLVR.#######
Familial Hypercholesterolemia	chr19:11,175,000-11,195,000	14001	GGC TGA GAC	GGA GTC CTG	ESGHLKR.QREHLTT.#######	QTTHLSR.DHSSLKR.#######
Myc-gene Cancer	chr8:128,938,529-128,941,440	198	GGT GCA GGG	GGC TGA CTC	VDHHLRR.QSTTLKR.RRAHLQN	ESGHLKR.QREHLTT.#######
Myc-gene Cancer	chr8:128,938,529-128,941,440	981	GGA GAG GGT	GGC TGG AAA	QANHLSR.RQDNLGR.TRQKLET	EKSHLTR.RREHLTI.#######

Green cells are our target sequences.

June 24

Designed primer for testing HisB deletion, reuse His_Internal_R to test the band

Updated Closest Zif268 Fingers

We realized that some of our "close non-zif268 fingers" were actually not all that close to Zif268, and so we went into the 88,000 zinc finger database and pulled out zinc fingers surrounding zif268. In fact, there were many, many, many zinc fingers that had identical sequences to the Zif268 F2 finger, and so we looked at sequences around it. The tree below shows the new non-zif268 backbones that are actually close to zif268 compared to our old set. The new set is in gray, the old set is in black. This gives us a potential seven more backbones to work with. File:ComparisonTree.png

June 24th - Bioinformatics

Sequence Generation

We made some small updates to the sequence generator, based on the frequencies we noticed in the outputs of the tests we ran.

We decided to only include pseudocounts for position 6 for 'CNN' and 'ANN.' Originally, 'CNN' and 'ANN' were using pseudocounts for all seven positions. However, this introduced a noticeable increase in amino acids, such as tyrosine (Y), that have been shown to occur rarely in zinc fingers (according to our data from OPEN and Persikov). Additionally, because tryosines occured so rarely in the data (11 times total in the open data set), we decided not to give tyrosine a pseudocount.
We added the capability to prevent repeat backbone-helix combinations on the chip. That is, we wanted to make sure that the same exact zinc finger was not generated for different triplet inputs.

To test the sequence generator, we made two sets of 2000 sequences for GAA, then infographic-d the results. Comparing these with the images for OPEN and OPEN+Persikov shows that our generation follows the major themes of those datasets, but also introduces variation. The two generated sets also vary slightly from each other, which shows the influence of randomness on the generation.

File:GAA generated round 1.png Round 1 of generating sequences for GAA with the program.	File:GAA generated round 2.png Round 2 of generating sequences for GAA with the program.
File:GAA open and persikov.png GAA sequences from the OPEN dataset.	File:GAA open only.png GAA sequences from Persikov and OPEN datasets.

Disease	Target DNA Finger 1	Helices in Zif268 Backbone	Helices in Zif268 Closely-Related Backbones	Helices in Zif268 Distantly-Related Backbones
Colorblindness (Bottom)	TGG	5150	3000	1000
Colorblindness (Top)	ATG	3050	3050	3050
Familial Hypercholesterolemia (Bottom)	GAC	5150	3000	1000
Familial Hypercholesterolemia (Top)	CTG	3050	3050	3050
Myc (Top₁₉₈)	CTC	3050	3050	3050
Myc (Top₉₈₁)	AAA	3050	3050	3050

Table of target sequences and helix distribution across backbones

Distribution: Zif268 : Zif268 similar : Zif 268 dissimilar
- Conservative distribution 56.3 : 32.8 : 10.9
- Riskier distribution 33.3 : 33.3 : 33.3

June 24th

pZE21G:

reinoculated culture with 100µL of saturated solution, grew to mid-log, and made glycerol stock
backbone PCR: ran E gel but no bands--PCR unsuccessful. We may need to use a different backbone for the zinc fingers.

Omega and Omega+Zif268:

these were the only two PCR reactions from 6/22/11 to work
PCR purified using Qiagen kit:
- omega: 6.1ng/µL, 260/280=1.83
- omega+Zif268: 11.3 ng/µL, 260/280=1.67

Lambda red recombination of selection system:

reinoculated selection strain+pKD46 with 100µL of saturated solution
just before mid-log (about 4 hours after inoculation) divided culture in half (1.5mL) and added either 37.5µL or 3.75µL of 20% arabinose solution (to try two different induction levels). Cultures grew for another hour.
The rest of the procedure was the same as the 6/22/11 attempt but without the 42C water bath.

June 24th - Bioinformatics

Playing with Pseudocounts

Using CTC because of position 6's reliance on the CNN frequencies, we see what difference values of pseudocounts (if in the frequency table, the frequency of an amino acid is 0, bump it up to the psuedocount: ex. A = 0 becomes A = .015 with a psuedocount of .015) make. Pseudocounts are necessary for data that has small sample size - we could be missing out on working helices because a letter's frequency is 0 when it shouldn't be.

Various pseudocount (psu = ) values. Look at the 7th column, which is position 6 in the helix:

File:CTC 0.png psu = 0	File:CTC .005 psuedo.png psu = .005	File:CTC .008 psuedo.png psu = .008
File:CTC .01.png psu = .01	File:CTC .015 psuedo.png psu = .015.	File:CTC .02 psuedo.png psu = .020.

The variation from E being the top letter to A being top back to E is from a slight adjustment in how we add on psuedocounts: the 'new' way is a more proportional approach.

Notice how psu = 0 gives only the four letters found in our dataset, while psu > 0 adds in other letters, each with a small probability ranging from .5% to 2%.

The question is how much psu to add: less means we weight our (possibly flawed) data of proven zinc fingers more. Higher psu adds more randomness (variation) to our sequences, but some fraction of those sequences will not work.

List of Remaining Goals:

Sort fingers by target
Pick and assign primer sets
Reverse translate fingers avoiding type II restriction enzymes and primers
Append type II restriction enzyme and primer sequences to each finger
Yay

@@ Line 551: / Line 551: @@
 Additionally, primer tags '''(forward: GTACATGAAACGATGGACGG, reverse:CTGGTATAGTCTCCTCAGCG)''' will be assigned to the 100 control sequences.</div>
-<div id="627" style="display:none">
-==June 27, Wet lab==
-'''Sequencing PyrF, rpoZ loci:'''
-*We will sequence these genes in the selection strain just to make sure they are knocked out, especially since it appears HisB is not.
-*Picked a colony off ∆HisB∆PyrF∆rpoZ plate (6/21) and grew in 150µL LB plus tet in a 96 well plate for about 2 hrs at 37˚C
-*diluted 1 in 20 and used 1µL as template in PCR with KAPA mastermix (see protocols for reagent amounts and parameters)
-**annealing temp 65˚C, elongation time 1:15
-#PyrF_F, PyrF_R primers
-#PyrF_F, PyrF_internalR
-#rpoZ_F, rpoZ_R
-#rpoZ_F, rpoZ_internalR
-#rpoZ_R, zeocin_R
-*Run on E Gel to check PCR worked: bands are at the same sizes as the original genotyping gel.
-[[File:2011.06.27.pyrFrpoZloci2(labeled)|thumb|none|PCR of kan-ZFB-wp-his3-ura3]]
-*Tomorrow we will send samples to Genewiz for sequencing
-'''Lambda Red recombination:'''
-*The plates made from the recombination (6/24) did have colonies, but they were very small and took a long time to grow, and so they may not actually have the kan-ZFB insert. We will have to PCR the locus to see.
-*Chose 8 colonies from each plate and grew at 30˚C in 150µL LB plus kan in a 96 well plate
-*When our primers arrive, we will PCR the locus to check for the insert.
-'''Selection system media:'''
-==June 27th - Bioinformatics==
-===To Do for Today===
-# 100 sequences (and control), 2 each with the same F3 and F2, but different F1, from our test sequences [zif268, OZ123, OZ052, CoDA]✓
-#Type II nuclease cut site sequences- put the binding sites into our oligos ✓
-#Final backbones with helices ✓
-#Programming stuff- Check to make sure there are no cut sites or primers in any of our backbone/helices combinations; check translation order (translates F1&rarr;F3)✓
-===100 Control Sequences===
-* See our [[Media:Positive Control Sequences PostMacro.xlsx|Positive Control Sequence Table]], updated June 28th
-* Selected known binding zinc fingers from the CODA table that bind sequences similar to our target sequences
-* All control helices from CODA were inserted into Zif268 F2 backbones and have been assigned a seventh primer tag separate from the tags given to the 6 target sequences.
-===Updated Target Sequences===
-One of our sequences from before was bad because the F3/F2 combo did not appear in the CODA table... faulty checking, my bad :(
-Here is the newest table of target sequences:
-{| class="wikitable" cellpadding="5"
-| align="center" style="background:#f0f0f0;"|'''Disease'''
-| align="center" style="background:#f0f0f0;"|'''Target Range'''
-| align="center" style="background:#f0f0f0;"|'''Binding Site Location'''
-| align="center" style="background:#f0f0f0;"|'''Bottom Finger'''
-| align="center" style="background:#f0f0f0;"|'''Top Finger'''
-| align="center" style="background:#f0f0f0;"|'''Bottom AA (F3 to F1)'''
-| align="center" style="background:#f0f0f0;"|'''Top AA (F3 to F1)'''
-|-
-| Colorblindness||chrX:153,403,001-153,407,000||3666|| style="background:#92D050" |GTG GGA TGG || style="background:#92D050" | GAA GGG ACC||RNTALQH.QSAHLKR.#######||QDGNLGR.RREHLVR.#######
-|-
-| Familial Hypercholesterolemia||chr19:11,175,000-11,195,000||14001||style="background:#92D050" | GGC TGA GAC||style="background:#92D050" | GGA GTC CTG||ESGHLKR.QREHLTT.#######||QTTHLSR.DHSSLKR.#######
-|-
-| Myc-gene Cancer||chr8:128,938,529-128,941,440||198||GGT GCA GGG||style="background:#92D050" | GGC TGA CTC||VDHHLRR.QSTTLKR.RRAHLQN||ESGHLKR.QREHLTT.#######
-|-
-| Myc-gene Cancer||chr8:128,938,529-128,941,440||981||GGA GAG GGT||style="background:#92D050" | GGC TGG AAA||QANHLSR.RQDNLGR.TRQKLET||EKSHLTR.RREHLTI.#######
-|}
-*Green cells are our target sequences.
-===Cut Site Design===
-*See our [[Cut Site Design]] page
-*We left in one proline (P) between the linker and the starting FCQ... of finger 2, but as this proline is the last AA of the OPEN linker (TGEKP) and occurs before the beta sheet in every zinc finger in Zif268 (see zif268's sequence on its [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1AAY PDB page])
-*This configuration also allows the library to be used at any finger position because proline ends the OPEN linker.
-===Updates on the program===
-The program appears to run extremely slowly because of the computationally intensive step of checking the reverse translated sequences
-*In addition to checking for the primers and cutsites, we also have to check for 'GGGGGG' because it can lead to undesirable structures forming. In addition, we have to check for the reverse complements for all these undesirable sequences.
-*We decided on a similarity of 0.8 as the maximum acceptable similarity between the sequence the primer bind to and any other part of the generated sequence. If the sequences are too similar, the primer might mishybridize. We originally had a similarity threshold of 0.6 but that made the program run too slowly, so we decided on a '''threshold of 0.8'''.</div>

Disease	Target Sequence	Forward Primer (5'-3' NOT REVERSE COMPLEMENT)	Reverse Primer (5'-3' NOT REVERSE COMPLEMENT)
Colorblindness	GCT GGC TGG	ATATAGATGCCGTCCTAGCG	AAGTATCTTTCCTGTGCCCA
Colorblindness	GCG GTA ATG	CCCTTTAATCAGATGCGTCG	TGGTAGTAATAAGGGCGACC
Familial Hypercholesterolemia	GGC TGA GAC	TTGGTCATGTGCTTTTCGTT	AGGGGTATCGGATACTCAGA
Familial Hypercholesterolemia	GGA GTC CTG	GGGTGGGTAAATGGTAATGC	ATCGATTCCCCGGATATAGC
Myc-gene Cancer	GGC TGA CTC	TCCGACGGGGAGTATATACT	TACTAACTGCTTCAGGCCAA
Myc-gene Cancer	GGC TGG AAA	CATGTTTAGGAACGCTACCG	AATAATCTCCGTTCCCTCCC

Team:Harvard/Template:NotebookData3

From 2011.igem.org

Revision as of 23:50, 2 August 2011

Contents

June 21st

June 21st - Bioinformatics

Persikov Statistics - Graphs

Phone Call with Dan

Chip Design

Identifying dependencies

June 22nd

June 22nd - Bioinformatics

Final target sequences

Finalizing the non-Zif268 backbones

Updated Chip Design

Finishing the generator

June 23rd

June 23rd - Bioinformatics

Revising Target Sequences

June 24

Updated Closest Zif268 Fingers

June 24th - Bioinformatics

Sequence Generation

June 24th

June 24th - Bioinformatics

Playing with Pseudocounts

June 25th-26th - Bioinformatics