Team:CBNU-Korea/Methods/Stat

From 2011.igem.org

(Difference between revisions)
 
(12 intermediate revisions not shown)
Line 22: Line 22:
body{
body{
-
background-color:#454545;
+
background-color:#ffffff;
}
}
Line 39: Line 39:
}
}
#menubar{
#menubar{
-
background-color:#454545;
+
background-color:#ffffff;
}
}
#menubar:hover{
#menubar:hover{
Line 45: Line 45:
}
}
.left-menu li a {
.left-menu li a {
-
color: #ffffff;
+
color: #000000;
}
}
.right-menu ul a {
.right-menu ul a {
Line 84: Line 84:
#logo{
#logo{
-
width:700px;
+
width:800px;
background:transparent;
background:transparent;
overflow:auto;
overflow:auto;
Line 97: Line 97:
<body>
<body>
<div>
<div>
-
<a href="javascript:history.go(-1);"><img src="https://static.igem.org/mediawiki/2011/7/77/Synb_back.png"></a>
+
<a href="javascript:history.go(-1);"><img src="https://static.igem.org/mediawiki/2011/a/a9/MB_0006_back.png" width="64px" height="64px"></a>
</div>
</div>
<div id="logo">
<div id="logo">
-
<img src="https://static.igem.org/mediawiki/2011/c/c1/Synb_Project_Abstract.png">
+
<img src="https://static.igem.org/mediawiki/2011/9/90/MethodInformation_Statistics.png">
</div>
</div>
-
<div id="test" style="width:700px; height:450px; background:transparent; overflow:auto;">
+
<div id="test" style="width:800px; background:transparent; overflow:auto;">
-
<br><font color=white size="4" face="Tahoma">
+
<br><font color=black size="4" face="Tahoma">
-
To design the Synthetic Minimal Chromosome of a bacterium, that the information on essential genes, such as direction, position, length and function, is essential. In addition a new analyzing method which calculates the distance between replication origin and each essential gene (DTO; Distance to origin) in each species, and provides the number of essential genes within 10 percent of total genome size are required. Using this method, we confirmed a distribution of essential genes in each organism.<br>
+
We analysis basic statistics and graph to study characterization of various parameter about 15 species. After classified EG, strand, direction, we analysis mean, standard deviation about 15 species. As a results of analysis, group mean is similar with no significant difference. Also, standard deviation can be said to satisfy the homoscedacity without a significant difference. So, the test about average difference of DistToOri was meaningless. Therefore, analysis of frequency about species.<br>
-
In this study the information of essential genes will be obtained from DEG (Database of Essential Genes). We will re-group essential genes by COG distribution for construction of our database which is connected to a software named GOD (Genome Organization Database & Designer).
+
<img src="https://static.igem.org/mediawiki/2011/4/4a/Synb_Stat_001.png"><br>
 +
To find out the characteristics of species, we draw a histogram like that.<br>
 +
-The frequency of gene about scale<br>
 +
<img src="https://static.igem.org/mediawiki/2011/9/94/Synb_Stat_002.png"><br>
 +
-The frequency of scale about strand and direction<br>
 +
<img src="https://static.igem.org/mediawiki/2011/2/25/Synb_Stat_003.png"><br>
 +
-The frequency of gene about two group (leading, lagging)<br>
 +
<img src="https://static.igem.org/mediawiki/2011/9/9c/Synb_Stat_004.png"><br>
 +
<br>
 +
Before the estimation of distribution, we start to study about transform dataset of Gamma’s 8 species and analysis basic statistics and graph. We make all Gamma’s data (NC_000907 (Haemophilus influenzae Rd KW20), NC_000913 (Escherichia coli MG1655), NC_002505 (Vibrio cholerae N16961), NC_002506 (Vibrio cholerae N16961), NC_003197 (Salmonella typhimurium LT2), NC_004631 (Salmonella enterica serovar Typhi), NC_005966 (Acinetobacter baylyi ADP1), NC_008463 (Pseudomonas aeruginosa UCBPP-PA14)) into one data set(dataset gamma_all)<br>
 +
<img src="https://static.igem.org/mediawiki/2011/6/6f/Synb_Stat_005.png"><br>
 +
<img src="https://static.igem.org/mediawiki/2011/d/d5/Synb_Stat_006.png"><br>
 +
<br>
 +
To see if the proportion between the essential gene is differences in gamma, we performed the hypothesis test. Hypotheses are as follows.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/7/70/Synb_ss1.png"><br>
 +
<img src="https://static.igem.org/mediawiki/2011/6/64/Synb_Stat_007.png"><br>
 +
<img src="https://static.igem.org/mediawiki/2011/5/5f/Synb_Stat_007-1.png"><br>
 +
<br>
 +
To verify the above hypothesis, we performed X^2-test. As a result, Chi-square statistic is <img src="https://static.igem.org/mediawiki/2011/1/11/Synb_ss2.png"> and p-value is lower than 0.0001. rejected null hypothesis (<img src="https://static.igem.org/mediawiki/2011/d/d0/Synb_ss3.png">). We can say difference of proportion between the essential gene exist.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/a/a6/Synb_Stat_008.png"><br>
 +
To study the distribution of essential gene for Gamma, We analysis the distribution about 8 species(already known). So, we estimate proportion of the essential gene classified scale. In order to estimate the distribution of gamma, We calculated the ratio about each species already known. i-th ratio Pi is calculated the number of essential genes divided by the number of entire genes with in i-th scale.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/2/2c/Synb_ss4.png"><br>
 +
So, <img src="https://static.igem.org/mediawiki/2011/5/5b/Synb_ss5.png"> for gamma(unknown) is estimated from <img src="https://static.igem.org/mediawiki/2011/5/5b/Synb_ss5.png"> for 8 species in gamma(already known).<br>
 +
<img src="https://static.igem.org/mediawiki/2011/4/4e/Synb_Stat_009.png"><br>
 +
we selected sample from Sampling list includes the entire speicies the Gamma by simple random sampling method.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/3/3d/Synb_Stat_011.png"><br>
 +
<img src="https://static.igem.org/mediawiki/2011/b/b2/Synb_Stat_012.png"><br>
 +
<img src="https://static.igem.org/mediawiki/2011/0/00/Synb_Stat_013.png"><br>
 +
Using the result in estimation of proportion, distribution of essential gene for sample.From already known 8 species, proportion of estimation is as follows.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/c/c1/Synb_Stat_014.png"><br>
 +
 
 +
we estimated frequency of essential gene(Ni) unknown species based on estimation of proportion.<br>
 +
<img src="https://static.igem.org/mediawiki/2011/f/fa/Synb_ss6.png"><br>
</font>
</font>
</div>
</div>

Latest revision as of 22:12, 5 October 2011


We analysis basic statistics and graph to study characterization of various parameter about 15 species. After classified EG, strand, direction, we analysis mean, standard deviation about 15 species. As a results of analysis, group mean is similar with no significant difference. Also, standard deviation can be said to satisfy the homoscedacity without a significant difference. So, the test about average difference of DistToOri was meaningless. Therefore, analysis of frequency about species.

To find out the characteristics of species, we draw a histogram like that.
-The frequency of gene about scale

-The frequency of scale about strand and direction

-The frequency of gene about two group (leading, lagging)


Before the estimation of distribution, we start to study about transform dataset of Gamma’s 8 species and analysis basic statistics and graph. We make all Gamma’s data (NC_000907 (Haemophilus influenzae Rd KW20), NC_000913 (Escherichia coli MG1655), NC_002505 (Vibrio cholerae N16961), NC_002506 (Vibrio cholerae N16961), NC_003197 (Salmonella typhimurium LT2), NC_004631 (Salmonella enterica serovar Typhi), NC_005966 (Acinetobacter baylyi ADP1), NC_008463 (Pseudomonas aeruginosa UCBPP-PA14)) into one data set(dataset gamma_all)



To see if the proportion between the essential gene is differences in gamma, we performed the hypothesis test. Hypotheses are as follows.




To verify the above hypothesis, we performed X^2-test. As a result, Chi-square statistic is and p-value is lower than 0.0001. rejected null hypothesis (). We can say difference of proportion between the essential gene exist.

To study the distribution of essential gene for Gamma, We analysis the distribution about 8 species(already known). So, we estimate proportion of the essential gene classified scale. In order to estimate the distribution of gamma, We calculated the ratio about each species already known. i-th ratio Pi is calculated the number of essential genes divided by the number of entire genes with in i-th scale.

So, for gamma(unknown) is estimated from for 8 species in gamma(already known).

we selected sample from Sampling list includes the entire speicies the Gamma by simple random sampling method.



Using the result in estimation of proportion, distribution of essential gene for sample.From already known 8 species, proportion of estimation is as follows.

we estimated frequency of essential gene(Ni) unknown species based on estimation of proportion.