Genetic algorithms explained

Genetic algorithms are a class of optimization algorithms (algorithms that help to maximize some function by adjusting the input parameters) that are know for their ability to handle large unpredictable search spaces.

In Cumulus we employ them the bridge the gap between having a forward model (a model that can simulate a biological process given its parameters) and knowing what parameters makes this simulation fit the experimental data best. We do this because creating a backward model, one that tells us what parameters will fit a set of measurements, is complicated for even simple systems let alone for larger devices. Genetic algorithms will help us find the parameters without the need for a backward model.

How genetic algorithms work

A genetic algorithm mimics the process of population genetics in order to optimize some fitness criterion. In our case this criterion is based on how good the simulated data matches the experimental results.
The following pseudocode shows the basics behind a genetic algorithm.

 selectedpopulation  = initialize()
 while stop criterion has not been met
   newpopulation       = mutate(selectedpopulation)
   newpopulationscores = evaluate(newpopulation)
   selectedpopulation  = select(newpopulation, newpopulationscores)

As you can see in the code above there are five basic steps to a genetic algorithm, three of these are repeating. Each of these steps is is explained below.

Initialization

First of the algorithm needs some sort of starting point. This can be some paramters found in literature but also paramters as they where found by fitting other parts. In our case the estimation of the parameters made by the user. If he has no idea what these should be he can draw inspiration by looking at parameters of the same type for different parts.

Mutation

In the mutation step we add new individuals to the population. This can be done in many different ways. Classically crossover (using values of two individuals and mixing them up into a new individual) and point mutations (adjusting a single value by a small random amount) are very popular.
In cumulus we use a method called gaussian estimation. This method assumes the optimality surface is roughly in the shape of a n-dimensional gaussian. In the selection step we then try to estimate the shape of this guasion by taking the covariance matrix of all the individuals in the population(weigthed by fitness). Then we restore our population by randomly drawing new individuals from this gausian distribution.

Evaluation

In the evaluation step yet unevaluated individuals are evaluated. In most literatures this is seen as a part of the selection step. In our system however this means the running of several model in in even more experimental settings and comparing to aquired measurements. Then combining all these comparisons into a single fitness score. It is safe to say that the brunt of our computation is consumed in this step.

Selection

In the selection step we discard some individuals of our population that we deem not good enough. For us this is as strait-forward as throwing away the worst preforming half individuals. Because of the gausian estimation mutation method we are generating enough different individuals for us not to worry about diversity preservation.

Modularety

We programmed each of the steps (mutation, evaluation, selection) in our system as sepaparate objects. Because of this changing any of the methods by a different algoritm replacing them is as eazy as swapping a class by another one implementing the same abstract class.

Team:Groningen/modeling genetic algorithms

From 2011.igem.org