Team:USTC-Software/tech&algo
From 2011.igem.org
Line 374: | Line 374: | ||
<h1>Technology & Algorithm</h1> | <h1>Technology & Algorithm</h1> | ||
+ | |||
+ | <p>The Perl language is a powerful tool for dealing with regular expressions, and it manages to process complex problems in a timely way. For example, for a hash array with a few elements in the buckets almost get the same manipulating time with a big hash with millions of elements. This feature improves the speed of rule based modeling remarkably.</p> | ||
+ | |||
+ | <p>Dealing with regular expressions is also Perl’s cup of tea. So the software can spend more running time saved by perl on providing a better user’s interface, making it more convenient for users.</p> | ||
+ | |||
+ | <p>Our approach first realized by Liaochen and 2010igemers emphasize on the structure of the species.</p> | ||
+ | |||
+ | <p>Below is an illustration of how the algorithm processes the classical circuit example:</p> | ||
+ | |||
+ | <p>Part1: toggle switch.</p> | ||
+ | |||
+ | figure1 | ||
+ | |||
+ | <p>Toggle switch is characterized by mutual inhibitory network. That is, the protein product of one gene represses the promoter of the other, and vice versa. So with proper parameters, the system will stay at one stable steady state at equilibrium. In other words, one gene’s product will dominate. But with proper inducers (usually IPTG or ATC), the system manage to flop from one state to another, still keeping steady.</p> | ||
+ | |||
+ | <p>We tried an instance of a toggle switch composed of a promoter which can be repressed by laci protein, and is denoted as pLac(r0010). And the other promoter is tet-repressible and be denoted by pTet(r0040). The plac promoter promotes the expression of TetR(c0040), And the LacR(c0012) is initiated by pTet(r0040).</p> | ||
+ | |||
+ | figure2 | ||
+ | |||
+ | <p>Without inducers, which protein will dominate depend on the binding affinity of the repressor to the promoter. In this case, it’s the LacR protein take control at null input of inducers. The only input to the algorithm is a input file like this.</p> | ||
+ | |||
+ | figure3 | ||
+ | |||
+ | <p>Here are a bit explanations to the input file</p> | ||
+ | |||
+ | <p>As to the seedspecies section:<br/> | ||
+ | # Medium iptg : IPTG is contained in the compartment named medium<br/> | ||
+ | # Ecoli dna1 d:r0040(tetr1,tetr2)-b0034(rib)-c0012(dna,iptg,dim)-b0014()<br/> | ||
+ | #dna1_init<br/> | ||
+ | # d:means it’s a strand of dna composed of 4 parts.<br/> | ||
+ | # The promoters have two binding site(tetr1 and tetr2) for associative repressors. <br/> | ||
+ | # c0012(dna,iptg,dim) means the c0012 is a DNA sequence. Iptg means the product #of c0012, which is LacI protein, in terms of dimer, can be binded to iptg which #deactivate it’s repression to placi promoter. <br/> | ||
+ | #b0014()is the terminator.<br/> | ||
+ | #dna1_init means the initial concentration of dna1 is dana1_init which is given in the parameter section of the input file.</p> | ||
+ | |||
+ | <p>Once the input file is provided, both the net and SBML file will be generated within a second. The species “discovered” by the algorithm is illustrated below, totally 18 substances in such a simple network.</p> | ||
+ | |||
+ | figure4 | ||
+ | |||
+ | <p>A bit explanations are given to the table above.</p> | ||
+ | |||
+ | <p>1.<br/> | ||
+ | #nb:no binding,the type of the species, which indicate that iptg can release the binding of laci to the plac promoter.<br/> | ||
+ | #d: means the following thing is a sequence of dna with structure information.<br/> | ||
+ | # b0014() is a terminator with no binding site.<br/> | ||
+ | # the hyphen sign connects the parts on a dna.<br/> | ||
+ | #C0012*(dim,dna,iptg) means the c0012 is a dna sequence, and the product of c0012 which is #laci protein, can form dimers, iptg can bind to monomers of laci .</p> | ||
+ | |||
+ | <p>2.<br/> | ||
+ | As you can see from species 12 to 15, the advantage of this approach is that it contains information about the structure of the species. For instance, s12 is nb:i0001(laci!1).p:c0012(dim!2,dna,iptg!1).p:c0012(dim!2,dna,iptg)</p> | ||
+ | |||
+ | <p>The ! is used to denote binding , here iptg bind to the laci denoted by !1, and the . is used to separate different molecules of one species. The molecule that comprise the s12 species are nb:i0001(laci!1),p:c0012(dim!2,dna,iptg!1),p:c0012(dim!2,dna,iptg).</p> | ||
+ | |||
+ | <p>The two proteins bind to form dimers through bingding !2.<br/> | ||
+ | So now you can imagine the structure of s12 species.<br/> | ||
+ | The network generated by our software is as below. User can drag the nodes to the place they want to have a better clarity of the network.</p> | ||
+ | |||
+ | figure5 | ||
+ | |||
+ | <p> | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
<h2>Algorithm</h2> | <h2>Algorithm</h2> |
Revision as of 15:47, 2 October 2011
USTC-Software
Technology & Algorithm
The Perl language is a powerful tool for dealing with regular expressions, and it manages to process complex problems in a timely way. For example, for a hash array with a few elements in the buckets almost get the same manipulating time with a big hash with millions of elements. This feature improves the speed of rule based modeling remarkably.
Dealing with regular expressions is also Perl’s cup of tea. So the software can spend more running time saved by perl on providing a better user’s interface, making it more convenient for users.
Our approach first realized by Liaochen and 2010igemers emphasize on the structure of the species.
Below is an illustration of how the algorithm processes the classical circuit example:
Part1: toggle switch.
figure1Toggle switch is characterized by mutual inhibitory network. That is, the protein product of one gene represses the promoter of the other, and vice versa. So with proper parameters, the system will stay at one stable steady state at equilibrium. In other words, one gene’s product will dominate. But with proper inducers (usually IPTG or ATC), the system manage to flop from one state to another, still keeping steady.
We tried an instance of a toggle switch composed of a promoter which can be repressed by laci protein, and is denoted as pLac(r0010). And the other promoter is tet-repressible and be denoted by pTet(r0040). The plac promoter promotes the expression of TetR(c0040), And the LacR(c0012) is initiated by pTet(r0040).
figure2Without inducers, which protein will dominate depend on the binding affinity of the repressor to the promoter. In this case, it’s the LacR protein take control at null input of inducers. The only input to the algorithm is a input file like this.
figure3Here are a bit explanations to the input file
As to the seedspecies section:
# Medium iptg : IPTG is contained in the compartment named medium
# Ecoli dna1 d:r0040(tetr1,tetr2)-b0034(rib)-c0012(dna,iptg,dim)-b0014()
#dna1_init
# d:means it’s a strand of dna composed of 4 parts.
# The promoters have two binding site(tetr1 and tetr2) for associative repressors.
# c0012(dna,iptg,dim) means the c0012 is a DNA sequence. Iptg means the product #of c0012, which is LacI protein, in terms of dimer, can be binded to iptg which #deactivate it’s repression to placi promoter.
#b0014()is the terminator.
#dna1_init means the initial concentration of dna1 is dana1_init which is given in the parameter section of the input file.
Once the input file is provided, both the net and SBML file will be generated within a second. The species “discovered” by the algorithm is illustrated below, totally 18 substances in such a simple network.
figure4A bit explanations are given to the table above.
1.
#nb:no binding,the type of the species, which indicate that iptg can release the binding of laci to the plac promoter.
#d: means the following thing is a sequence of dna with structure information.
# b0014() is a terminator with no binding site.
# the hyphen sign connects the parts on a dna.
#C0012*(dim,dna,iptg) means the c0012 is a dna sequence, and the product of c0012 which is #laci protein, can form dimers, iptg can bind to monomers of laci .
2.
As you can see from species 12 to 15, the advantage of this approach is that it contains information about the structure of the species. For instance, s12 is nb:i0001(laci!1).p:c0012(dim!2,dna,iptg!1).p:c0012(dim!2,dna,iptg)
The ! is used to denote binding , here iptg bind to the laci denoted by !1, and the . is used to separate different molecules of one species. The molecule that comprise the s12 species are nb:i0001(laci!1),p:c0012(dim!2,dna,iptg!1),p:c0012(dim!2,dna,iptg).
The two proteins bind to form dimers through bingding !2.
So now you can imagine the structure of s12 species.
The network generated by our software is as below. User can drag the nodes to the place they want to have a better clarity of the network.
Algorithm
Particle swarm optimization (PSO) Algorithm
Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling.
Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) it has achieved so far. (The fitness value is also stored.) This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbors of the particle. This location is called lbest. when a particle takes all the population as its topological neighbors, the best value is a global best and is called gbest.
The particle swarm optimization concept consists of, at each time step, changing the velocity of (accelerating) each particle toward its pbest and lbest locations (local version of PSO). Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and lbest locations.
PSO has been successfully applied in many research and application areas. It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods.
Simulated Annealing(SA) Algorithm
Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more efficient than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.
In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.
At each step, the SA heuristic considers some neighbouring state s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour.
The probability of making the transition from the current state s to a candidate new state s' is specified by an acceptance probability function P(e,e',T), that depends on the energies e = E(s) and e' = E(s') of the two states, and on a global time-varying parameter T called the temperature. States with a smaller energy are better than those with a greater energy. The probability function P must be positive even when e' is greater than e. This feature prevents the method from becoming stuck at a local minimum that is worse than the global one.