Team:USTC-Software/tech&algo

From 2011.igem.org

(Difference between revisions)
 
(39 intermediate revisions not shown)
Line 27: Line 27:
a:active, a:hover { color: #0683ab; text-decoration: underline; }
a:active, a:hover { color: #0683ab; text-decoration: underline; }
-
img { margin: 0px; padding: 0px; border: none; }
+
img { margin: auto; padding: 0px; border: none; display:block; }
#header_wrapper {
#header_wrapper {
Line 207: Line 207:
font-family: Verdana, Arial;
font-family: Verdana, Arial;
font-size: 14px;
font-size: 14px;
 +
}
 +
 +
#intro p {
 +
font-family: Verdana, Arial;
 +
font-size: 12px;
}
}
Line 284: Line 289:
/* end of footer*/
/* end of footer*/
-
/* note*/
+
.ul_p{
-
 
+
-
#sidebar .note_list {
+
-
margin: 0;
+
-
padding: 0;
+
-
list-style: none;
+
-
}
+
-
 
+
-
#sidebar .note_list li {
+
-
padding: 0;
+
-
margin: 0;
+
-
}
+
-
 
+
-
.note_list_p{
+
font-family: Verdana, Arial;
font-family: Verdana, Arial;
-
font-size: 16px;
+
font-size: 14px;
-
}
+
-
 
+
-
.note_list li a {
+
-
display: block;
+
-
color: #201f1c;
+
-
padding: 5px 0 5px 20px;
+
-
background: url(./images/USTC_Software_notebook.png) center left no-repeat;
+
-
}
+
-
.note_list li a:hover {
+
-
color: #537c11;
+
-
text-decoration: none;
+
}
}
Line 333: Line 314:
     <li><a class="on" href="https://2011.igem.org/Team:USTC-Software/project">Project</a>
     <li><a class="on" href="https://2011.igem.org/Team:USTC-Software/project">Project</a>
<ul>
<ul>
-
<li><a href="https://2011.igem.org/Team:USTC-Software/documents">Documents</a></li>
+
<li><a href="https://2011.igem.org/Team:USTC-Software/documents">Documents Parser</a></li>
<li><a href="https://2011.igem.org/Team:USTC-Software/models">Models</a></li>
<li><a href="https://2011.igem.org/Team:USTC-Software/models">Models</a></li>
-
<li><a href="https://2011.igem.org/Team:USTC-Software/views">Views</a></li>
+
    <li><a href="#">Views</a>
 +
                            <ul>
 +
                                <li><a href="https://2011.igem.org/Team:USTC-Software/aview">Assembly View</a></li>
 +
                                <li><a href="https://2011.igem.org/Team:USTC-Software/bview">Behavior View</a></li>
 +
                                <li><a href="https://2011.igem.org/Team:USTC-Software/nview">Network View</a></li>                     
 +
                            </ul>
                                 <li><a href="https://2011.igem.org/Team:USTC-Software/tech&algo">Technology & Algorithm</a></li>
                                 <li><a href="https://2011.igem.org/Team:USTC-Software/tech&algo">Technology & Algorithm</a></li>
-
                                 <li><a href="https://2011.igem.org/Team:USTC-Software/tutorial">Tutorial</a></li>
+
                                 <li><a href="https://2011.igem.org/Team:USTC-Software/tutorial">Tutorial & Demo</a></li>
</ul>
</ul>
</li>
</li>
-
                 <li><a href="https://2011.igem.org/Team:USTC-Software/notebook">Notebook</a></li>
+
                 <li><a href="https://2011.igem.org/Team:USTC-Software/notebook">Notebook</a></li>  
<li><a href="https://2011.igem.org/Team:USTC-Software/team">Team</a>
<li><a href="https://2011.igem.org/Team:USTC-Software/team">Team</a>
Line 348: Line 334:
                             <li><a href="https://2011.igem.org/Team:USTC-Software/members">members</a></li>
                             <li><a href="https://2011.igem.org/Team:USTC-Software/members">members</a></li>
                             <li><a href="https://2011.igem.org/Team:USTC-Software/collaboration">collaboration</a></li>
                             <li><a href="https://2011.igem.org/Team:USTC-Software/collaboration">collaboration</a></li>
-
                             <li><a href="https://2011.igem.org/Team:USTC-Software/attribution">attribution</a></li>
+
                             <li><a href="https://2011.igem.org/Team:USTC-Software/attribution">attribution & contributions</a></li>
                             <li><a href="https://2011.igem.org/Team:USTC-Software/acknowledgements">acknowledgements</a></li>
                             <li><a href="https://2011.igem.org/Team:USTC-Software/acknowledgements">acknowledgements</a></li>
                           </ul>
                           </ul>
Line 373: Line 359:
<div id="content_wrapper">
<div id="content_wrapper">
-
<h1>Technology & Algorithm</h1>
+
<div id="intro">
-
 
+
<p> Contents:
-
<p>The Perl language is a powerful tool for dealing with regular expressions, and it manages to  process complex problems in a timely way. For example, for a hash array with a few elements in the buckets almost get the same manipulating time with a big hash with millions of elements. This feature improves the speed of rule based modeling remarkably.</p>
+
  <ul type="square">  
-
 
+
-
<p>Dealing with regular expressions is also Perl’s cup of tea. So the software can spend more running time saved by perl on providing a better user’s interface, making it more convenient for users.</p>
+
-
 
+
-
<p>Our approach first realized by Liaochen and 2010igemers emphasize on the structure of the species.</p>
+
-
 
+
-
<p>Below is an illustration of how the algorithm processes the classical circuit example:</p>
+
-
 
+
-
<p>Part1: toggle switch.</p>
+
-
 
+
-
figure1
+
-
 
+
-
<p>Toggle switch is characterized by mutual inhibitory network. That is, the protein product of one gene represses the promoter of the other, and vice versa. So with proper parameters, the system will stay at one stable steady state at equilibrium. In other words, one gene’s product will dominate. But with proper inducers (usually IPTG or ATC), the system manage to flop from one state to another, still keeping steady.</p>
+
-
 
+
-
<p>We tried an instance of a toggle switch composed of a promoter which can be repressed by laci protein, and is denoted as pLac(r0010). And the other promoter is tet-repressible and be denoted by pTet(r0040). The plac promoter promotes the expression of TetR(c0040), And the LacR(c0012) is initiated by pTet(r0040).</p>
+
-
 
+
-
figure2
+
-
 
+
-
<p>Without inducers, which protein will dominate depend on the binding affinity of the repressor to the promoter. In this case, it’s the LacR protein take control at null input of inducers. The only input to the algorithm is a input file like this.</p>
+
-
 
+
-
figure3
+
-
 
+
-
<p>Here are a bit explanations to the input file</p>
+
-
 
+
-
<p>As to the seedspecies section:<br/>
+
-
# Medium iptg :  IPTG is contained in the compartment named medium<br/>
+
-
# Ecoli dna1 d:r0040(tetr1,tetr2)-b0034(rib)-c0012(dna,iptg,dim)-b0014()<br/>
+
-
#dna1_init<br/>
+
-
# d:means it’s  a strand of dna composed of 4 parts.<br/>
+
-
# The promoters have two binding site(tetr1 and tetr2) for associative repressors. <br/>
+
-
# c0012(dna,iptg,dim) means the c0012 is a DNA sequence. Iptg means the product #of c0012, which is LacI protein, in terms of dimer, can be binded to iptg which #deactivate it’s repression to placi promoter. <br/>
+
-
#b0014()is the terminator.<br/>
+
-
#dna1_init means the initial concentration of dna1 is dana1_init which is given in the parameter section of the input file.</p>
+
-
 
+
-
<p>Once the input file is provided, both the net and SBML file will be generated within a second. The species “discovered” by the algorithm is illustrated below, totally 18 substances in such a simple network.</p>
+
-
 
+
-
figure4
+
-
 
+
-
<p>A bit explanations are given to the table above.</p>
+
-
 
+
-
<p>1.<br/>
+
-
#nb:no binding,the type of the species, which indicate that iptg can release the binding of laci to the plac promoter.<br/>
+
-
#d: means the following thing is a sequence of dna with structure information.<br/>
+
-
# b0014() is a terminator with no binding site.<br/>
+
-
# the hyphen sign connects the parts on a dna.<br/>
+
-
#C0012*(dim,dna,iptg) means the c0012 is a dna sequence, and the product of c0012 which is #laci protein,  can form dimers,  iptg can bind to monomers of laci .</p>
+
-
 
+
-
<p>2.<br/>
+
-
As you can see from species 12 to 15, the advantage of this approach is that it contains information about the structure of the species. For instance, s12 is nb:i0001(laci!1).p:c0012(dim!2,dna,iptg!1).p:c0012(dim!2,dna,iptg)</p>
+
-
 
+
-
<p>The ! is used to denote binding , here iptg bind to the laci denoted by !1, and the . is used to separate different molecules of one species. The molecule that comprise the s12 species are nb:i0001(laci!1),p:c0012(dim!2,dna,iptg!1),p:c0012(dim!2,dna,iptg).</p>
+
-
 
+
-
<p>The two proteins bind to form dimers through bingding !2.<br/>
+
-
So now you can imagine the structure of s12 species.<br/>
+
-
The network generated by our software is as below. User can drag the nodes to the place they want to have a better clarity of the network.</p>
+
-
 
+
-
figure5
+
-
 
+
-
<p>One thing to note is the assumptions made by Chen LIAO:<br/>
+
-
1.Both LacR protein and TetR protein can form dimers. But only the dimers, rather than the monomers, can bind to the promoter regions
+
-
to repress the expressions of the downstream genes.<br/>
+
-
<em>(#note that this a an advantage of our algorithm, it is more close to biology reality)</em><br/>
+
-
 
+
-
2.IPTG molecules can bind to LacR proteins or each monomer of LacR Dimers but cannot bind to LacR dimers whose DNA-binding domains Have been occupied.<br/>
+
-
<em/>(#When there are less LacR proteins, there are less LacR dimers. When there are less LacR dimers, the complex of LacR dimer and DNA is more likely to disassociate.)</em><br/>
+
-
 
+
-
3.There are leaky expressions for repressed pLac and pTet promoters.</p>
+
-
 
+
-
<p>In order to test the ability of the system to switch from one state to the other, we conducted a time course simulation. The IPTG are added to the system at time 10000s.<br/>
+
-
The GFP expression level changes triggered by the pulse are illustrated by curves with different colors.</p>
+
-
 
+
-
figure6
+
-
 
+
-
<p>As you can see from the figure above, at initial time, the LacI protein soon takes control and reaches equilibrium. When IPTG is added to the system, there is a sharp decrease. Since the half life of LacR is short and IPTG is so favorable to bind with LacR, making it inactive (lose the ability to bind tightly to the promoter that helps to express TetR) </p>
+
-
 
+
-
<p>Part2:  RTC two counter (RiboRegulated transcription cascade)</p>
+
-
 
+
-
<p>Recent years saw a emergence of the small RNA application in the gene regulation network. This is a timely way to regulate gene expression compared to other transcription level modulation.</p>
+
-
 
+
-
<P>As a mimic to the electronic digital circuit that can count the pulses or other events (a basic function of most MCU module), synthetic biologist designed ways to count biological events such as the adding of inducers. Here, we adopted the RTC-two counter as an example.</p>
+
-
 
+
-
figure7
+
-
 
+
-
<p>The constitutive promoter pl tet 0-1 , which corresponds to j23100 part, drives the T7 RNA polymerase T7RNAP (i2032 part), the RNAP binds to THE T7 promoter PT7 and the GFP will be expressed then.</p>
+
-
 
+
-
<P>Between them is the part j01010, which consist of the rbs sequence and the complementary sequence cr  ( so the cr section of the mRna transcribed will form a loop with the Rbs section ,thus inhibiting translation)</p>
+
-
 
+
-
<p>B0014 is the T7 promoter. The RNAP binds to the T7 promoter to drive the expression of the downstream gene (GFP). But still, the translation is inhibited by the stem loop formed by the cr and RBS section of j01010.</p>
+
-
 
+
-
figure8
+
-
 
+
-
<p>To release the inhibition, the arabinose induced promoter PBAD can drive the expression of
+
-
J01008, which corresponds to taRNA.  But this promoter can be repressed by c0080, which is the AraC protein dimer. When arabinoses are added to the system, the arabinoses bind to the AraC monomers and prevent them from forming dimers. Thus release the repression to the i01008 promoter.</p>
+
-
 
+
-
<p>The taRNA is specific to the cr sequence and it bind to the cr sequence, thus open the stem loop, allowing the expression of RNAP and GFP.</p>
+
-
 
+
-
<p>The input file to this case is as below:</p>
+
-
 
+
-
figure9
+
-
 
+
-
<p>The network view generated by our software is as below (this visualization is realized by Junyuan Xie, Fangming Liu, and Chuocheng He. You can drag the nodes to your preferred place.</p>
+
-
 
+
-
figure10
+
-
 
+
-
<p>The list of species generated by the algorithm is listed the table below.</p>
+
-
 
+
-
figure11
+
-
 
+
-
<p>Below is some assumptions made by Liao Chen who worked out this example with his algorithm.<br/>
+
-
1.AraC proteins cannot degrade so as to keep forever repression of pBAD before any pulse of arabinose.<br/>
+
-
2.AraC protein can form dimers. But only the dimers, rather than the monomers,can bind to pBAD and repress the expressions of the
+
-
downstream genes.<br/>
+
-
3.Arabinose molecules can bind to AraC proteins or each monomer of AraC dimers but cannot bind to AraC dimers whose DNA-binding
+
-
Domains have been occupied.<br/>
+
-
4.No leaky expressions is possible for both pBAD and T7 promoter, Which can reduce the leaky GFP expression (noise) before the second.</p>
+
-
 
+
-
<p>Pulse comes and make the leap of GFP amount more impressive and distinguishable.</p>
+
-
 
+
-
<p>We also attest the validity of our approach using the time course simulation in this case.</p>
+
-
 
+
-
figure12
+
-
 
+
-
<p>As you can see from above, two Arabinose pulses are added to the system at time 2500.01s and 2561.01s. The black curve shows the normalized GFP expression of the cells that received both the two pulses. The red line refers to the cells that only receive the first arabinose pulse and the blue line refers to the cells that only receive the second one.</p>
+
-
 
+
-
<p>It is clear from the figure, that at the point of the pulse, the GFP level begin to leap to a higher state. Since the inducer is removed from the environment after it generate the pulse, the expression of RNAP will not last long so there is only a small leap if the cell received only one pulse, if with the second, the GFP level will leap markedly as indicated by the blackline.</p>
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
-
 
+
 +
          <li><a href="#Algorithm"> Algorithm </a></li>
 +
         
 +
            <ul type="circle">
 +
              <li><a href="#Particle swarm optimization (PSO) Algorithm">Particle swarm optimization (PSO) Algorithm</a></li>
 +
             
 +
                <ul type="disc">
 +
                  <li><a href="#History">History</a></li>
 +
                  <li><a href="#Algorithm introduction">Algorithm introduction</a></li>
 +
                  <li><a href="#Application">Application</a></li>
 +
                </ul>
 +
              </ul>
 +
                 
 +
            <ul type="circle">   
 +
              <li><a href="#Simulated Annealing(SA) Algorithm">Simulated Annealing(SA) Algorithm</a></li>
 +
                <ul type="disc">
 +
                  <li><a href="#Principle of the algorithm">Principle of the algorithm</a></li>
 +
                  <li><a href="#Steps">Steps</a></li>
 +
                  <li><a href="#State transition">State transition</a></li>
 +
                  <li><a href="#Probability function of the state transition">Probability function of the state transition</a></li>
 +
                </ul>
 +
            </ul>
 +
         
 +
            <ul type="circle"> 
 +
              <li><a href="#MoDeL">MoDeL</a></li>           
 +
          </ul>
 +
    </ul> 
 +
</p>
 +
             
 +
</div> 
 +
<h2><a name="Particle swarm optimization (PSO) Algorithm" id="Particle swarm optimization (PSO) Algorithm"></a>Particle swarm optimization (PSO) Algorithm</h2>
 +
<h3><a name="History" id="History"></a>History</h3>
 +
<p>Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995,
 +
inspired by social behavior of bird flocking or fish schooling. The particle swarm concept originated as a simulation of simplified social system.
 +
The original intent was to graphically simulate the choreography of bird of a bird block or fish school.
 +
However, it was found that particle swarm model can be used as an optimizer. </p>
 +
 +
<h3><a name="Algorithm introduction" id="Algorithm introduction"></a>Algorithm introduction</h3>
 +
 +
<p>Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) it has achieved so far. (The fitness value is also stored.)
 +
  This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbors of the particle.
 +
  This location is called lbest. when a particle takes all the population as its topological neighbors, the best value is a global best and is called gbest. </p>
 +
 
 +
<p>The particle swarm optimization concept consists of, at each time step, changing the velocity of (accelerating) each particle toward its pbest and lbest locations (local version of PSO).
 +
  Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and lbest locations.</p>
 +
 
 +
  <img src="https://static.igem.org/mediawiki/2011/e/ec/USTC_Software_PSO.jpg" />
 +
 
 +
<h3><a name="Application" id="Application"></a>Application</h3>
 +
 
 +
<p>PSO has been successfully applied in many research and application areas.
 +
  It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods.</p>
 +
 
 +
<h2><a name="Simulated Annealing(SA) Algorithm" id="Simulated Annealing(SA) Algorithm"></a>Simulated Annealing(SA) Algorithm </h2>
 +
<p>Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities).
 +
For certain problems, simulated annealing may be more efficient than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.</p>
 +
<h3><a name="Principle of the algorithm" id="Principle of the algorithm"></a>Principle of the algorithm</h3>
 +
<p>In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state.
 +
The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.</p>
 +
<h3><a name="Steps" id="Steps"></a>Steps</h3>
 +
<p>At each step, the SA heuristic considers some neighbouring state s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. These probabilities ultimately lead the system to move to states of lower energy.
 +
Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.</p>
 +
<h3><a name="State transition" id="State transition"></a>State transition</h3>
 +
<p>The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities.
 +
The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts.
 +
In the previous example, the parts of the solution are the parts of the tour.</p>
 +
<h3><a name="Probability function of the state transition" id="Probability function of the state transition"></a>Probability function of the state transition</h3>
 +
<p>The probability of making the transition from the current state s to a candidate new state s' is specified by an acceptance probability function P(e,e',T), that depends on the energies e = E(s) and e' = E(s') of the two states, and on a global time-varying parameter T called the temperature. States with a smaller energy are better than those with a greater energy. The probability function P must be positive even when e' is greater than e.
 +
This feature prevents the method from becoming stuck at a local minimum that is worse than the global one.</p>
 +
<img src="https://static.igem.org/mediawiki/2011/3/32/USTC_Software_SA.jpg" />
 +
<h2><a name="MoDeL" id="MoDeL"></a>MoDeL</h2>
 +
<p>The Perl language is a powerful tool for dealing with regular expressions, and it manages to  process complex problems in a timely way. For example, for a hash array with a few elements in the buckets almost get the same manipulating time with a big hash with millions of elements. This feature improves the speed of rule based modeling remarkably.</p>
 +
<p>Dealing with regular expressions is also Perl's cup of tea. So the software can spend more running time saved by perl on providing a better user's interface, making it more convenient for users.</p>
 +
<p>Our approach first realized by Liaochen and 2010igemers emphasize on the structure of the species.</p>
-
<h2>Algorithm</h2>
+
<p>Here is a detailed explanation to the input file of the rule based modeling approach
 +
There are four blocks, respectively lists the definition of parameters, the definition of compartments, seed species and events.</p>
 +
<p>The definition of parameters consists of two items. The left side term is the name of the variable, to the right is the expression of that parameter. Note that variables in that expression must be defined by the user. But it makes no difference whether they are defined before the expression. This is to avoid redundant definition and no definition. </p>
 +
<br />
 +
<p>The compartments definition is in this way: <br/>
 +
[name ][outside][ruletable]{volume}{population}, where terms in the square brackets are mandatory, while terms in the curly braces are optional.<br/>
 +
The default value of the volume is 1. The outside term means the compartment outside the compartment, which is usually the medium that held the compartment like the ecoli. <br/>
 +
Ruletable is used to associate a rule in the data base with the compartment. </p>
 +
<br />
-
<h3>Particle swarm optimization (PSO) Algorithm</h3>
+
<p>The third block, reads [compartment][name][structure][init_concentration]{const}, structure is the definition of the complex structure, const is optional since it's in the curly brace.  If the substance has a const property, then after the substance, write a const. if not, leave it blank.</p>
-
<p>Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling.</p>
+
<p>The last block is the events definition. The formats are [name][trigger_condition][event_assignments...]
 +
The name of the event is usr defined, trigger_condition is a bool expression, all the variables inside this bool expression must have been defined in one of the above three blocks. Event_assignments is a list of assignments, something like assignment1 assignment2 assignment3, with white space separated. There is no constraint on the number of assignments. Each assignment must be the format [variable]=[expression]</p>
-
<p>Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) it has achieved so far. (The fitness value is also stored.) This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbors of the particle. This location is called lbest. when a particle takes all the population as its topological neighbors, the best value is a global best and is called gbest. </p>
+
<p>A general rule in the algorithm is that the name of the variable must be started by A-Z –z and other characters can be A-Z-a-z-0-9.</p>
-
<p>The particle swarm optimization concept consists of, at each time step, changing the velocity of (accelerating) each particle toward its pbest and lbest locations (local version of PSO). Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and lbest locations.</p>
 
-
<p>PSO has been successfully applied in many research and application areas. It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods.  </p>
+
<h2>iGAME</h2>
-
<h3>Simulated Annealing(SA) Algorithm </h3>
+
<p>According to the rule-based languages, such as BNGL[1,2] and Kappa Language[3], molecules are
 +
represented with structured objects comprised of agents that can take any number of sites.
 +
Bonds could be formed by connecting sites on different agents to group them together and
 +
form complex. Typically, sites represent functional physical entities of various kinds of
 +
molecules, such as DNA-binding domains and promoter regions. Besides serving as terminal
 +
ends of bonds, sites can also take on any number of labels representing its internal states.
 +
Labels can be phosphorylation/unphorsphorylation status of proteins and open/close status
 +
of a complex of RNA polymerase and a promoter. Biological processes between agents can be
 +
easily and conveniently represented by the changes of their sites on the binding
 +
status and labels. An example of Lac dimer is given to briefly illustrate the syntax
 +
of this representation. Each Lac repressor, LacR, has one dimerization domain dim and
 +
can thus be written as LacR(dim). By connecting the two dim domains of two LacRs,
 +
the Lac dimer is then represented as LacR(dim!1).LacR(dim!1), where `.' separates different agents
 +
and the same number following `!' shared by two dim domains indicates a bond formation
 +
between them. The essences of those languages are preserved in MoDeL but with some major changes
 +
to make it better suited to describe synthetic biological systems.</p>
 +
<br/>
-
<p>Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more efficient than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution. </p>
+
<p>Synthetic biological systems are mostly Genetic Regulatory Networks (GRNs), in which
 +
regulations in both transcriptional and translational level are key to control the system
 +
to exhibit some complex bahaviors, such as oscillation.
 +
DNA sequences composed of basic BioBrick parts (agents) are not consicely represented,
 +
resulting in unreadability of the reaction rules containing such DNA sequences. In their
 +
representations, each part has both upstream and downstream domain and adjacent parts are
 +
connected by forming a bond between the upstream domain of one part and the downstream
 +
domain of the other. In addition, all the agents in Kappa Language are allowed to appear in
 +
any order so that it may blur the recognition of spatial relationship of agents; it is
 +
hard to pick up the agents in a DNA chain with the correct order. For example,
 +
a complex of RNA polymerase binding to trp promoter (MIT registry ID K191007) with LOVTAP
 +
on p2 can be rewritten as `\textbf{\small{DNA(up!2,bind,type$\sim$K191007p3), LOVTAP(dna!1),
 +
DNA(bind!1,type$\sim$K191007p2,down!2), RNAP(dna,rna)}}'. However, most readers, even specialists,
 +
have some difficulties to recognize and map it to the molecule it represents in a short
 +
time, making the process of model contruction laborious and error-prone.</p>
 +
<br/>
-
<p>In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.</p>
+
<p>Aiming to simplify the representation of DNA and RNA sequences, we use the hyphen character `-'
 +
to connect two adjacent BioBricks rather than through bonding between one pair of
 +
upstream and downstream sites. This is good for visualization since the structure of
 +
molecules pictured in this way can be understood eaisly by readers. As to the syntax
 +
of single BioBrick representation, there are some minor differences with Kappa language:
 +
(1) the name of each BioBrick should be the same with the identifier of that BioBrick
 +
in the MIT registry; (2) the reversed sequence of each DNA BioBrick is assumed when an
 +
asterik character `*' appears following its name; But the sites of each BioBrick
 +
are defined in the same way as Kappa Language does and those names should be different
 +
with each other indicating equivalent sites are not supported. For example, the reverse part
 +
of Lac promoter (r0010) can be represented as r0010*(), where sites in the
 +
parenthesis are not shown. It is worthy of nothing that the parenthesis cannot be
 +
omitted even for BioBricks that do not take any sites. By convention,
 +
a DNA sequence written from left to right is assumed to read from 5' end to 3' end.</p>
 +
<br/>
-
<p> At each step, the SA heuristic considers some neighbouring state s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.</p>
+
<p>As shown in the example of Lac dimer, a molecule may contain several agents as members
 +
separated by `.', by which means an agent is the basic unit of its composition. In MoDeL,
 +
the basic unit, however, has been expanded horizontally to be a sequence and
 +
different sequences, also separated by `.' in the representation, constitute molecules
 +
by forming bonds between them. The concept of sequence is not limited to DNA or RNA chains,
 +
but can be generalized to proteins and non-BioBricks in which cases
 +
each sequence may have only one agent. One example is Isopropyl beta-D-1-thiogalactopyranoside
 +
(IPTG). It has a binding domain \textit{laci} for Lac repressor and thus can be
 +
represented as i0001(laci), where i0001 is not a valid identifier
 +
in MIT registry since IPTG is not a BioBrick part but a conceptual part in our extension,
 +
which can be represented using the same syntax of BioBrick definition. In a short summary,
 +
sequences are chains composed of one or more parts, which are not limited to BioBrick parts or even
 +
not corresponded to any actual entities.</p>
-
<p>The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour. </p>
+
<br/>
-
<p>The probability of making the transition from the current state s to a candidate new state s' is specified by an acceptance probability function P(e,e',T), that depends on the energies e = E(s) and e' = E(s') of the two states, and on a global time-varying parameter T called the temperature. States with a smaller energy are better than those with a greater energy. The probability function P must be positive even when e' is greater than e. This feature prevents the method from becoming stuck at a local minimum that is worse than the global one. </p>
+
<p>There is a problem brought by the use of registry identifiers as the name of BioBricks:
 +
in principle, protein coding sequences should also have both RNA and protein
 +
representations and other type of BioBricks, such as promoters, ribosome binding sites and
 +
terminators, should have RNA representations. Each BioBrick, when transcribed or translated
 +
possibly, cannot be distinguished from its DNA, RNA and protein representations simply by
 +
its name and a qualifier is needed to tag a sequence explicitly to tell whether it is a BioBrick
 +
and if it is, which type of representation it belongs to. This can be easily done by adding
 +
a type qualifier preceding a colon character `:' to the head of a sequence representation
 +
to distinguish the four cases. In the current version of MoDeL, the four permitted types
 +
of sequences are DNA(d), RNA(r), Protein(p) and Non-BioBricks(nb), where the character in parenthesis
 +
following each type is the type qualifier used to tag sequences. For example, the tet repressor (tetR)
 +
can be represented as p:c0040(dna), which is totally different with d:c0040(dna), the protein
 +
coding sequence for tetR protein.</p>
 +
<br/>

Latest revision as of 07:59, 5 October 2011


Team:USTC-Software - 2011.igem.org/Technology & Algorithm

Particle swarm optimization (PSO) Algorithm

History

Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling. The particle swarm concept originated as a simulation of simplified social system. The original intent was to graphically simulate the choreography of bird of a bird block or fish school. However, it was found that particle swarm model can be used as an optimizer.

Algorithm introduction

Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) it has achieved so far. (The fitness value is also stored.) This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbors of the particle. This location is called lbest. when a particle takes all the population as its topological neighbors, the best value is a global best and is called gbest.

The particle swarm optimization concept consists of, at each time step, changing the velocity of (accelerating) each particle toward its pbest and lbest locations (local version of PSO). Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and lbest locations.

Application

PSO has been successfully applied in many research and application areas. It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods.

Simulated Annealing(SA) Algorithm

Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more efficient than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.

Principle of the algorithm

In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.

Steps

At each step, the SA heuristic considers some neighbouring state s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.

State transition

The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour.

Probability function of the state transition

The probability of making the transition from the current state s to a candidate new state s' is specified by an acceptance probability function P(e,e',T), that depends on the energies e = E(s) and e' = E(s') of the two states, and on a global time-varying parameter T called the temperature. States with a smaller energy are better than those with a greater energy. The probability function P must be positive even when e' is greater than e. This feature prevents the method from becoming stuck at a local minimum that is worse than the global one.

MoDeL

The Perl language is a powerful tool for dealing with regular expressions, and it manages to process complex problems in a timely way. For example, for a hash array with a few elements in the buckets almost get the same manipulating time with a big hash with millions of elements. This feature improves the speed of rule based modeling remarkably.

Dealing with regular expressions is also Perl's cup of tea. So the software can spend more running time saved by perl on providing a better user's interface, making it more convenient for users.

Our approach first realized by Liaochen and 2010igemers emphasize on the structure of the species.

Here is a detailed explanation to the input file of the rule based modeling approach There are four blocks, respectively lists the definition of parameters, the definition of compartments, seed species and events.

The definition of parameters consists of two items. The left side term is the name of the variable, to the right is the expression of that parameter. Note that variables in that expression must be defined by the user. But it makes no difference whether they are defined before the expression. This is to avoid redundant definition and no definition.


The compartments definition is in this way:
[name ][outside][ruletable]{volume}{population}, where terms in the square brackets are mandatory, while terms in the curly braces are optional.
The default value of the volume is 1. The outside term means the compartment outside the compartment, which is usually the medium that held the compartment like the ecoli.
Ruletable is used to associate a rule in the data base with the compartment.


The third block, reads [compartment][name][structure][init_concentration]{const}, structure is the definition of the complex structure, const is optional since it's in the curly brace. If the substance has a const property, then after the substance, write a const. if not, leave it blank.

The last block is the events definition. The formats are [name][trigger_condition][event_assignments...] The name of the event is usr defined, trigger_condition is a bool expression, all the variables inside this bool expression must have been defined in one of the above three blocks. Event_assignments is a list of assignments, something like assignment1 assignment2 assignment3, with white space separated. There is no constraint on the number of assignments. Each assignment must be the format [variable]=[expression]

A general rule in the algorithm is that the name of the variable must be started by A-Z –z and other characters can be A-Z-a-z-0-9.

iGAME

According to the rule-based languages, such as BNGL[1,2] and Kappa Language[3], molecules are represented with structured objects comprised of agents that can take any number of sites. Bonds could be formed by connecting sites on different agents to group them together and form complex. Typically, sites represent functional physical entities of various kinds of molecules, such as DNA-binding domains and promoter regions. Besides serving as terminal ends of bonds, sites can also take on any number of labels representing its internal states. Labels can be phosphorylation/unphorsphorylation status of proteins and open/close status of a complex of RNA polymerase and a promoter. Biological processes between agents can be easily and conveniently represented by the changes of their sites on the binding status and labels. An example of Lac dimer is given to briefly illustrate the syntax of this representation. Each Lac repressor, LacR, has one dimerization domain dim and can thus be written as LacR(dim). By connecting the two dim domains of two LacRs, the Lac dimer is then represented as LacR(dim!1).LacR(dim!1), where `.' separates different agents and the same number following `!' shared by two dim domains indicates a bond formation between them. The essences of those languages are preserved in MoDeL but with some major changes to make it better suited to describe synthetic biological systems.


Synthetic biological systems are mostly Genetic Regulatory Networks (GRNs), in which regulations in both transcriptional and translational level are key to control the system to exhibit some complex bahaviors, such as oscillation. DNA sequences composed of basic BioBrick parts (agents) are not consicely represented, resulting in unreadability of the reaction rules containing such DNA sequences. In their representations, each part has both upstream and downstream domain and adjacent parts are connected by forming a bond between the upstream domain of one part and the downstream domain of the other. In addition, all the agents in Kappa Language are allowed to appear in any order so that it may blur the recognition of spatial relationship of agents; it is hard to pick up the agents in a DNA chain with the correct order. For example, a complex of RNA polymerase binding to trp promoter (MIT registry ID K191007) with LOVTAP on p2 can be rewritten as `\textbf{\small{DNA(up!2,bind,type$\sim$K191007p3), LOVTAP(dna!1), DNA(bind!1,type$\sim$K191007p2,down!2), RNAP(dna,rna)}}'. However, most readers, even specialists, have some difficulties to recognize and map it to the molecule it represents in a short time, making the process of model contruction laborious and error-prone.


Aiming to simplify the representation of DNA and RNA sequences, we use the hyphen character `-' to connect two adjacent BioBricks rather than through bonding between one pair of upstream and downstream sites. This is good for visualization since the structure of molecules pictured in this way can be understood eaisly by readers. As to the syntax of single BioBrick representation, there are some minor differences with Kappa language: (1) the name of each BioBrick should be the same with the identifier of that BioBrick in the MIT registry; (2) the reversed sequence of each DNA BioBrick is assumed when an asterik character `*' appears following its name; But the sites of each BioBrick are defined in the same way as Kappa Language does and those names should be different with each other indicating equivalent sites are not supported. For example, the reverse part of Lac promoter (r0010) can be represented as r0010*(), where sites in the parenthesis are not shown. It is worthy of nothing that the parenthesis cannot be omitted even for BioBricks that do not take any sites. By convention, a DNA sequence written from left to right is assumed to read from 5' end to 3' end.


As shown in the example of Lac dimer, a molecule may contain several agents as members separated by `.', by which means an agent is the basic unit of its composition. In MoDeL, the basic unit, however, has been expanded horizontally to be a sequence and different sequences, also separated by `.' in the representation, constitute molecules by forming bonds between them. The concept of sequence is not limited to DNA or RNA chains, but can be generalized to proteins and non-BioBricks in which cases each sequence may have only one agent. One example is Isopropyl beta-D-1-thiogalactopyranoside (IPTG). It has a binding domain \textit{laci} for Lac repressor and thus can be represented as i0001(laci), where i0001 is not a valid identifier in MIT registry since IPTG is not a BioBrick part but a conceptual part in our extension, which can be represented using the same syntax of BioBrick definition. In a short summary, sequences are chains composed of one or more parts, which are not limited to BioBrick parts or even not corresponded to any actual entities.


There is a problem brought by the use of registry identifiers as the name of BioBricks: in principle, protein coding sequences should also have both RNA and protein representations and other type of BioBricks, such as promoters, ribosome binding sites and terminators, should have RNA representations. Each BioBrick, when transcribed or translated possibly, cannot be distinguished from its DNA, RNA and protein representations simply by its name and a qualifier is needed to tag a sequence explicitly to tell whether it is a BioBrick and if it is, which type of representation it belongs to. This can be easily done by adding a type qualifier preceding a colon character `:' to the head of a sequence representation to distinguish the four cases. In the current version of MoDeL, the four permitted types of sequences are DNA(d), RNA(r), Protein(p) and Non-BioBricks(nb), where the character in parenthesis following each type is the type qualifier used to tag sequences. For example, the tet repressor (tetR) can be represented as p:c0040(dna), which is totally different with d:c0040(dna), the protein coding sequence for tetR protein.