Team:CBNU-Korea/Methods/Stat

From 2011.igem.org


We analysis basic statistics and graph to study characterization of various parameter about 15 species. After classified EG, strand, direction, we analysis mean, standard deviation about 15 species. As a results of analysis, group mean is similar with no significant difference. Also, standard deviation can be said to satisfy the homoscedacity without a significant difference. So, the test about average difference of DistToOri was meaningless. Therefore, analysis of frequency about species.

To find out the characteristics of species, we draw a histogram like that.
-The frequency of gene about scale

-The frequency of scale about strand and direction

-The frequency of gene about two group (leading, lagging)


Before the estimation of distribution, we start to study about transform dataset of Gamma’s 8 species and analysis basic statistics and graph. We make all Gamma’s data (NC_000907 (Haemophilus influenzae Rd KW20), NC_000913 (Escherichia coli MG1655), NC_002505 (Vibrio cholerae N16961), NC_002506 (Vibrio cholerae N16961), NC_003197 (Salmonella typhimurium LT2), NC_004631 (Salmonella enterica serovar Typhi), NC_005966 (Acinetobacter baylyi ADP1), NC_008463 (Pseudomonas aeruginosa UCBPP-PA14)) into one data set(dataset gamma_all)



To see if the proportion between the essential gene is differences in gamma, we performed the hypothesis test. Hypotheses are as follows.




To verify the above hypothesis, we performed X^2-test. As a result, Chi-square statistic is and p-value is lower than 0.0001. rejected null hypothesis (). We can say difference of proportion between the essential gene exist.

To study the distribution of essential gene for Gamma, We analysis the distribution about 8 species(already known). So, we estimate proportion of the essential gene classified scale. In order to estimate the distribution of gamma, We calculated the ratio about each species already known. i-th ratio Pi is calculated the number of essential genes divided by the number of entire genes with in i-th scale.

So, for gamma(unknown) is estimated from for 8 species in gamma(already known).

we selected sample from Sampling list includes the entire speicies the Gamma by simple random sampling method.



Using the result in estimation of proportion, distribution of essential gene for sample.From already known 8 species, proportion of estimation is as follows.

we estimated frequency of essential gene(Ni) unknown species based on estimation of proportion.