# Genetic algorithms explained

Genetic algorithms are a class of optimization algorithms (algorithms that help to maximize some function by adjusting the input parameters) that are know for their ability to handle large unpredictable search spaces.

In Cumulus we employ them the bridge the gap between having a forward model (a model that can simulate a biological process given its parameters) and knowing what parameters makes this simulation fit the experimental data best. We do this because creating a backward model, one that tells us what parameters will fit a set of measurements, is complicated for even simple systems let alone for larger devices. Genetic algorithms will help us find the parameters without the need for a backward model.

## How genetic algorithms work

A genetic algorithm mimics the process of population genetics in order to optimize some fitness criterion. In our case this criterion is based on how good the simulated data matches the experimental results.
The following pseudocode shows the basics behind a genetic algorithm.

``` selectedpopulation  = initialize()
while stop criterion has not been met
newpopulation       = mutate(selectedpopulation)
newpopulationscores = evaluate(newpopulation)
selectedpopulation  = select(newpopulation, newpopulationscores)
```

As you can see in the code above there are four basic steps to a genetic algorithm, three of these are repeating. Each of these steps is is explained below.

### Initialization

First of the algorithm needs some sort of starting point. This can be some parameters found in literature but also parameters as they where found by fitting other parts. In our case the estimation of the parameters were made by the user. If he has no idea what these should be he can draw inspiration by looking at parameters of the same type for different parts.

### Mutation

In the mutation step we add new individuals to the population. This can be done in many different ways. Classically crossover (using values of two individuals and mixing them up into a new individual) and point mutations (adjusting a single value by a small random amount) are very popular.
In cumulus we use a method called gaussian estimation. This method assumes the optimality surface is roughly in the shape of a n-dimensional gaussian. In the selection step we then try to estimate the shape of this gaussian by taking the covariance matrix of all the individuals in the population(weighted by fitness). After which we replenish our population by randomly drawing new individuals from this gausian distribution.

### Evaluation

In the evaluation step yet unevaluated individuals are evaluated. In most literature this is seen as a part of the selection step. In our system however this step means running several models in even more experimental settings and comparing these to measurements, then combining all these comparisons into a single fitness score. It is safe to say that the brunt of our computational power is consumed in this step.

### Selection

In the selection step we discard some individuals of our population that we deem not to be good enough. For us this is as strait-forward as throwing away the worst preforming half individuals. Because of the gausian estimation mutation method we are generating enough different individuals for us not to worry about diversity preservation.

## Modularity

We programmed each of the steps (mutation, evaluation, selection) in our system as separate objects. Therefore replacing any of the methods by a different algorithm is as easy as swapping a class by another one that implemented the same abstract base class.