Team:CBNU-Korea/Methods/Stat
From 2011.igem.org
![](https://static.igem.org/mediawiki/2011/9/90/MethodInformation_Statistics.png)
We analysis basic statistics and graph to study characterization of various parameter about 15 species. After classified EG, strand, direction, we analysis mean, standard deviation about 15 species. As a results of analysis, group mean is similar with no significant difference. Also, standard deviation can be said to satisfy the homoscedacity without a significant difference. So, the test about average difference of DistToOri was meaningless. Therefore, analysis of frequency about species.
![](https://static.igem.org/mediawiki/2011/4/4a/Synb_Stat_001.png)
To find out the characteristics of species, we draw a histogram like that.
-The frequency of gene about scale
![](https://static.igem.org/mediawiki/2011/9/94/Synb_Stat_002.png)
-The frequency of scale about strand and direction
![](https://static.igem.org/mediawiki/2011/2/25/Synb_Stat_003.png)
-The frequency of gene about two group (leading, lagging)
![](https://static.igem.org/mediawiki/2011/9/9c/Synb_Stat_004.png)
Before the estimation of distribution, we start to study about transform dataset of Gamma’s 8 species and analysis basic statistics and graph. We make all Gamma’s data (NC_000907 (Haemophilus influenzae Rd KW20), NC_000913 (Escherichia coli MG1655), NC_002505 (Vibrio cholerae N16961), NC_002506 (Vibrio cholerae N16961), NC_003197 (Salmonella typhimurium LT2), NC_004631 (Salmonella enterica serovar Typhi), NC_005966 (Acinetobacter baylyi ADP1), NC_008463 (Pseudomonas aeruginosa UCBPP-PA14)) into one data set(dataset gamma_all)
![](https://static.igem.org/mediawiki/2011/6/6f/Synb_Stat_005.png)
![](https://static.igem.org/mediawiki/2011/d/d5/Synb_Stat_006.png)
To see if the proportion between the essential gene is differences in gamma, we performed the hypothesis test. Hypotheses are as follows.
![](https://static.igem.org/mediawiki/2011/7/70/Synb_ss1.png)
![](https://static.igem.org/mediawiki/2011/6/64/Synb_Stat_007.png)
![](https://static.igem.org/mediawiki/2011/5/5f/Synb_Stat_007-1.png)
To verify the above hypothesis, we performed X^2-test. As a result, Chi-square statistic is
![](https://static.igem.org/mediawiki/2011/1/11/Synb_ss2.png)
![](https://static.igem.org/mediawiki/2011/d/d0/Synb_ss3.png)
![](https://static.igem.org/mediawiki/2011/a/a6/Synb_Stat_008.png)
To study the distribution of essential gene for Gamma, We analysis the distribution about 8 species(already known). So, we estimate proportion of the essential gene classified scale. In order to estimate the distribution of gamma, We calculated the ratio about each species already known. i-th ratio Pi is calculated the number of essential genes divided by the number of entire genes with in i-th scale.
![](https://static.igem.org/mediawiki/2011/2/2c/Synb_ss4.png)
So,
![](https://static.igem.org/mediawiki/2011/5/5b/Synb_ss5.png)
![](https://static.igem.org/mediawiki/2011/5/5b/Synb_ss5.png)
![](https://static.igem.org/mediawiki/2011/4/4e/Synb_Stat_009.png)
we selected sample from Sampling list includes the entire speicies the Gamma by simple random sampling method.
![](https://static.igem.org/mediawiki/2011/3/3d/Synb_Stat_011.png)
![](https://static.igem.org/mediawiki/2011/b/b2/Synb_Stat_012.png)
![](https://static.igem.org/mediawiki/2011/0/00/Synb_Stat_013.png)
Using the result in estimation of proportion, distribution of essential gene for sample.From already known 8 species, proportion of estimation is as follows.
![](https://static.igem.org/mediawiki/2011/c/c1/Synb_Stat_014.png)
we estimated frequency of essential gene(Ni) unknown species based on estimation of proportion.
![](https://static.igem.org/mediawiki/2011/f/fa/Synb_ss6.png)