Team:Harvard/Template:NotebookData2

From 2011.igem.org

(Difference between revisions)
Line 488: Line 488:
| [[File:nna_probs.png|thumb|left|Probability data for the 218 fingers that bind to '''NNA''' triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of a glutamine at position -1 and an arginine at position 6.]]
| [[File:nna_probs.png|thumb|left|Probability data for the 218 fingers that bind to '''NNA''' triplets. The position 4 leucine motif remains. There is also a very high (> 0.75) probability of a glutamine at position -1 and an arginine at position 6.]]
|}</div>
|}</div>
-
<div id="621" style="display:none">
 
-
==June 21st==
 
-
'''His3 sequencing results:'''
 
-
 
-
The sequencing results showed that the His3 (HisB) gene is still present in the strain and without any early stop codons. There is a 2 aa deletion in the middle of the protein, but its purpose is unknown and the gene likely is still fully functional.
 
-
*Restreak selection strain on plate from glycerol stock--tomorrow we will PCR the His3 locus and sequence again just to be sure.
 
-
*Made oligos for MAGE to insert stop codons and make a frame shift in the endogenous His3 gene, so that if necessary we can knockout His3 ourselves.
 
-
 
-
'''Selection strain with lambda red:'''
 
-
*Reinoculated and made glycerol stock
 
-
*prepared for MAGE tomorrow
 
-
 
-
==June 21st - Bioinformatics==
 
-
===Persikov Statistics - Graphs===
 
-
{|
 
-
| [[File:Scatterplot of top bottom 20 with SVM polynomial.png|thumb|left|Scatterplot of top/bottom 20 with SVM polynomial]]
 
-
| [[File:Sequence by sequence (lin SVM).png|thumb|left|Sequence by sequence (lin SVM)]]
 
-
| [[File:Top_Bottom_20_ZFs_(SVM_linear).png|thumb|left|Top/Bottom 20 ZFs (SVM linear)]]
 
-
| [[File:Comparison of polynomial vs linear distribution (polynomial generally higher values).png|thumb|left|Comparison of polynomial vs. linear distribution (polynomial generally higher values)]]
 
-
|}
 
-
 
-
*FQCRICMRNFS<sub>zif268 F2 Backbone</sub>/'''''Helix F1'''''/TGEKP<sub>linker</sub>
 
-
 
-
*The Persikov data shows weak predictive power for OPEN amino acid sequences. Our conclusion is that Persikov's program is not well-suited for incorporation into our helix generator. Testing Persikov's helices in his program yeilded mostly accurate results (approximately 24/25 matched known binding information). This is an important test because it proved that we are using the program correctly and that the program is in fact working properly. However, testing the OPEN sequences in Persikov's program resulted in numerous false negative values which informed our decision not to use Persikov's program to check our own hellix-generating program.
 
-
 
-
===Phone Call with Dan===
 
-
*How conservative/risky should we be in terms of using other backbones?
 
-
**<u>'''Conservative'''</u>
 
-
***'''Possible Pros:'''
 
-
****More likely to get something that will work
 
-
****Depending on how "smart" our probabilities are (from our ZF generation algorithm), we could cover a lot of novel space without straying too far from zif268
 
-
****''Worst Case'':Something we can show for iGEM (we covered the same ground OPEN did, and found many of the same ZFs, but with a targeted approach, a "smarter" method-- not throwing random things at it; Chip is not ours, but the program is "smarter")
 
-
***'''Possible Cons:'''
 
-
****Might end up covering the same ground as OPEN, but doing a "worse" job than they did
 
-
****Less likely to discover new/groundbreaking things (i.e., TNN triplets)
 
-
**<u>'''Less Conservative'''</u>
 
-
***Have 3-6 target sequences (we're currently going for 8)
 
-
***More backbones from non-zif268 than zif268
 
-
***'''Pros:'''
 
-
****We could get luck and find something no one has ever seen before (TNN, ANN). If we throw enough things at it, we're more likely to get luck.
 
-
***'''Cons:'''
 
-
****''Risk:'' Many of these backbones (from entire ZF world)may NOT bind DNA (i.e., may bind proteins)
 
-
****''Risk:'' May not find anything that binds, then the whole project is a dud
 
-
*'''What is the more important variable, helices or backbones?'''
 
-
**Helices seem to be more important, backbones of secondary importance
 
-
**Backbones: ZF's unravel DNA, open the major groove-- backbone is important here, changes the bond angle, etc. (Brandon's paper-??)
 
-
*'''''Balance''''' needed between low and high risk
 
-
**If we find backbones that we know bind DNA, greatly lowers our risk
 
-
**Limited spaces on chip: zero-sum game
 
-
**With a middle of the road approach, we diminish both benefits and risk (diminishes the benefits of the high risk approach much more than it diminishes the benefits of the conservative approach; i.e., if you're playing the lottery, you're more likely to win if you buy many more tickets)
 
-
*We need to compare probabilities of randomly-generated OPEN sequences vs. probabilities of sequences randomly generated by our program
 
-
**OPEN tries to cover all space: smaller probability
 
-
**If we have a "smarter" algorithm, we can produce fewer
 
-
**However,  the idea is not to repeat OPEN, but to go somewhere else, non-GNN sequences
 
-
**'''''Remember:''''' OPEN is a ''Cell'' paper; the point of the project is not to compare ourselves to them
 
-
*If we find binders for 1-2 of our sequences, that would be awesome
 
-
**Probably we'll have some that find none, some have 10, our last one might have 1,000 hits (then, we do bioinformatics to figure out why/what those hits were)
 
-
**Point: to learn and do high-level bioinformatics, and high-tech cloning techniques in the lab
 
-
**If you do find binders, you can write a paper about it!
 
-
*We have all the resources we need right now to build our chip
 
-
**We need to pick out targets
 
-
**'''Need to decide exactly what we want for:'''
 
-
***No. of target sequences/which ones
 
-
***No. of helices/ which ones
 
-
***Ratio of zif268 backbones: non-zif268 backbones
 
-
**Avoid switching Leucine out of position 4, then change other positions based on our frequencies
 
-
 
-
===Chip Design===
 
-
*No. of sequences will be more than we can put on the chip
 
-
**Helices: essentially unlimited
 
-
***Put more-likely-to-bind helices into the risky backbones
 
-
***Put less-likely-to-bind helices into a zif268 backbone
 
-
*Backbones
 
-
**Maybe revert to a more targeted approach: pick backbones that we know are transcription factors (TFs), that we know bind to DNA
 
-
**''OR'' research the ZFs from the phylogenetic tree
 
-
***Pick clades to research, see if one looks better than the other
 
-
**Why did OPEN cover so many helices, without changing the backbone, but still yield predominantly GNNs?
 
-
**If we have an idea of how the backbone might affect binding, maybe we could look into some sort of low-level modeling, etc. so that we wouldn't be grasping? Could Vatsan help with this?
 
-
***See 2000 Wolfe paper [http://www.ncbi.nlm.nih.gov/pubmed/10940247]
 
-
**Backbones ''could'' affect interactions between fingers
 
-
**Theory: energy penalty to ZF binding-- unravels DNA when binds to it
 
-
*We have 12 target sequences
 
-
**2 per 4 diseases, 4 for the 5th disease
 
-
**If we want to be more conservative, we could throw out Type III, but it could be something cool
 
-
**'''We should have mostly Type I (CoDA argument, if this is an F2)'''
 
-
**Proposed: 3 diseases, 6 sequences
 
-
***4 Type I (F3 and F2 known, F1 novel)
 
-
***1 Type II (GNN, ANN, GNN)
 
-
***1 Type III (All unknown, e.g., TNN, ANN, TNN;'''''max 1''''')
 
-
Or, for 3 diseases:
 
-
# Type I's
 
-
# Type I, Type II
 
-
# Type I, Type III
 
-
 
-
*'''<u>Clinical Targets</u>
 
-
# Colorblindness ('''Type I's''')
 
-
# Familial Hypercholesterolemia (FH) (1 in 500)
 
-
# <del>Cystic Fibrosis (CF)</del>
 
-
# <del>Tay Sachs</del>
 
-
# KRAS- (oncogene/cancer)
 
-
 
-
*'''Main goal of project''': to build outside of what is already known
 
-
**If we wanted to cure a disease only, we could just use existing ZFs (i.e., find GNN binding locations)
 
-
**Also, we lend a level of specificity for insertion/deletion
 
-
**There is the possibility that there might be some area where specificity might demand ANN codons
 
-
 
-
<u>'''Current decision on chip design:'''</u>
 
-
*We will have 6 target sequences, 2 each from colorblindness, FH, and KRAS.  All are "Type I" targets (only F1 is novel) with the middle finger chosen from the CODA paper (either GNN or TNN)
 
-
**N.B.: the CB and FH sequences make up full ZF nuclease cut sites. The KRAS sites, due to the small number of GNNTNN F3F2 combos available in CODA, are separate, with the flanking ZF nuclease site added afterwards in parentheses
 
-
# GGT'''G'''GT'''A'''AG (CB)
 
-
# GGA'''G'''TC'''C'''TG (FH)
 
-
# GGC'''T'''GA'''T'''GC (KRAS) (CTGAAAATT)
 
-
# GGC'''T'''GA'''C'''AC (FH)
 
-
# GGC'''T'''GG'''A'''AT (KRAS) (GACAAGAGC)
 
-
# GTC'''G'''CC'''T'''CC (CB)
 
-
 
-
*Targets 3, 4, and 6 are similar to sequences Zif268 variants successfully bind to, so the backbones will be weighted accordingly:
 
-
**Zif268_F2 backbone: 6000 helices (per target)
 
-
**10 backbones more closely related to Zif268: 300 helices each
 
-
*Targets 1, 2, and 5 will have equal distributions of backbones:
 
-
**Zif268_F2: 3000 helices
 
-
**10 backbones closely related to Zif268: 300 each
 
-
**10 backbones more distantly related to Zif268: 300 each
 
-
 
-
===Identifying dependencies===
 
-
*We looked at the [[#Probability data|probability graphs]] to determine which amino acid positions on the finger's helix interact with which bases.
 
-
**Some interactions are fairly well estabilished, while others have been more recently proposed (See [[#June 17 - Bioinformatics|interaction map (Persikov 2011)]])
 
-
**To identify these interactions in our own data we looked at which helix positions varied most when you changed the bases. A more rigorous way to do this is to calculate the entropy change as you change the amino acids in each position. 
 
-
***'''xNN'''(Vary base 1): Amino acid 6 changes
 
-
***'''NxN'''(Vary base 2): Amino acid 3 changes
 
-
***'''NNx'''(Vary base 3): Amino acid -1 and 2(?) changes
 
-
**Our program looks at dependencies between amino acids when generating sequences.
 
-
***We decided on these amino acid dependencies, using both established data and patterns we saw in the OPEN data:
 
-
****-1 and 2
 
-
****2 and 1
 
-
****6 and 5
 
-
**Because there is not much data for 'CNN' and 'ANN' sequences (with 16 and 29 known fingers that bind to each triplet, respectively), we should use pseudocounts for these sequences, so that our frequency generator is not too biased toward probabilities that may not be significant.</div>
 

Revision as of 23:48, 2 August 2011