Team:Johns Hopkins/Modeling/OptAlgs

From 2011.igem.org

VitaYeast - Johns Hopkins University, iGEM 2011

CMA-ES
The covariance matrix adaptation evolution strategy algorithm is stochastic in its selection of points to sample and does not require the approximation of derivatives[1]. The algorithm begins with an initial mean and covariance matrix. At each generation (iteration) the mean and covariance matrix are used to generate samples from a multivariate normal distribution with the same dimensionality as the feasible region.
CMA-ES involves an evolution strategy for both a mean and a covariance matrix
The objective is evaluated at these sampled points. The points are ordered from smallest objective value (winners) to greatest value (losers). A weighted average of the list is taken in order to pick a new mean with more weight given to the winners. Of the μ points picked, the best λ are reserved to be carried over into the next round.

This procedure is a form of maximum likelihood estimation. The weighting scheme attempts to select a new mean that will maximize the likelihood of best winners from generation N being picked in generation N+1. At the same time, the covariance matrix is also updated. This update is performed such that the best generations are more likely. Thus the distribution moves anisotropically: both the center of the distribution and the shape of the distribution change as it searches for a minimum.

DIRECT

User Guide can be found here.

Abbreviation of DIviding RECTangles, which is how the algorithm searches for the optimum value by globally converging on the minimal value[2]. The algorithm belongs to the class of branch-and-bound optimizers. It is deterministic and performs relatively few function evaluations, but performs very poorly on "hilly" objective functions. However, since it does not require many evaluations of the objective, it is especially useful for simulations and "black box" functions.

Previous algorithms evaluate endpoints of regions in a function,
Direct optimization.jpg
but this is difficult in higher dimensions. To address this, DIRECT samples the midpoints of each area. Also, it looks at the entire sample region to determine if it should be broken into sub-regions during the iteration.

To begin, DIRECT transforms the domain into a hyper-cube, divides it into regions depending on how optimal is, and continues to divide it into optimal hyper-rectangles. The algorithm then samples the center of each one. Once the algorithm finds a potentially optimal hyper-rectangle, it will divide it into smaller ones so that the area shrinks with each iteration. DIRECT stops when it is within 0.01% of the local minimum.


Implicit Filtering

"Implicit filtering is a hybrid of a projected quasi-Newton or Gauss-Newton algorithm for bound constrained optimization and nonlinear least squares problems and a deterministic grid-based search algorithm. The gradients for the quasi-Newton method and the Jacobians for the Gauss-Newton iteration are approximated with finite differences, and the difference increment varies as the optimization progresses. The points on the difference stencil are also used to guide a direct search."[3]

Implicit filtering works by minimizing an object function, f, in which f is constrained in a nominal design space, Ω, that is a hyperrectangle:

\[\Omega = \left ( x\in R^{N}\mid L_{i}\leq x_{i}\leq U_{i} \right )\]

NOTE: "Implicit filtering, and the other methods that are derived from coordinate search, are best used in cases where f is either not smooth, not everywhere defined, discontinuous, or when derivatives of f are too costly to obtain. The motivating examples for the construction of implicit filtering were problems in which f was a smooth function corrupted by low-amplitude, high-frequency noise, or which was not defined (i.e. the code for computing f failed) at many points in the nominal design space Ω."[4]

Implicit filtering is a sampling method, in which f is only evaluated at a cluster of points in Ω. This evaluation determines the next cluster of points. This is done using a stencil. The stencil used in implicit filtering evaluates the function at a current iterate called \[f(x_{c})\] then samples the next 2N points on \[x_{c}\pm h*v_{i};1\leq i\leq N\] where \[v_{i} = e_{i}(L_{i}-U_{i})\]

e is the unit vector in the ith coordinate direction, and h, the scale varies as the optimization progresses.

Implicit filtering uses the values of f on the stencil to create a difference gradient which is then used in a quasi-Newton method. Results at each quasi-Newton iteration are reported. When the supply of scales, h, has been exhausted, the optimization will terminate.

Differential Evolution

It is a stochastic, population-based optimization evolutionary algorithm. It was developed to optimize real parameter, real valued functions. It is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It does not guarantee an optimal solution is ever found. DE algorithm works by having a population of candidate solutions (called agents). These agents are moved around in the search-space by using simple mathematical formulae to combine the positions of existing agents from the population. If the new position of an agent is an improvement it is accepted and forms part of the population, otherwise the new position is simply discarded. The process is repeated and by doing so it is hoped, but not guaranteed, that a satisfactory solution will eventually be discovered.

General evolutionary algorithm procedure: Initialization -> Mutation -> Recombination -> Selection

Initialization:

Parameter limits should be defined, if not, parameter ranges should cover the suspected optimum Define upper and lower bounds for each parameter and randomly select the initial parameter values uniformly on the intervals.

Mutation:

Small random alterations to one or more parameters of an existing population. Each of the N parameter vectors undergoes mutation, recombination and selection. Mutation expands the search space. For a given parameter vector randomly select three vectors such that the indices i, r1, r2 and r3 are distinct. Add the weighted difference of two of the vectors to the third.

Recombination/Crossover:

Uniform crossover: Inherits parameter values from parents with equal probability

Non-uniform crossover: Takes parameters from one parent more often than the other Recombination incorporates successful solutions from the previous generation.

Selection:

Determine which among them will survive to the next generation Random approach using “tournament selection” = randomly paired the winner with all possible competition. DE: each child pits against one of its parents

Mutation, recombination and selection continue until some stopping criterion is reached.

Quasi Monte-Carlo Sampling

Quasi-MC sampling is a powerful tool for stochastic optimization and analysis of initial conditions. It generates a sequence of random vectors while guaranteeing a predictable spacing between them. This enables us to draw relatively few random vectors without neglecting wide swaths of the sample space. The function below is called haltonseq and takes as input the length N of the vector to generate and the number M of vectors to sample and returns a MxN matrix with points generated using the Halton sequence[5]

Haltonseq.zip


[1] Igel, C., Suttorp, T., & Hansen, N. (2006). A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies. Proceedings of the 8th annual conference on Genetic and evolutionary computation - GECCO ’06 (p. 453). New York, New York, USA: ACM Press.

[2] Finkel, Daniel E. DIRECT Optimization Algorithm User Guide. Raleigh: Center for Research in Scientific Computation, 2 Mar. 2003. PDF.

[3],[4]C T Kelley, “Users’ Guide for imfil Version 1.0.”

[5]J. H. Halton. 1964. Algorithm 247: Radical-inverse quasi-random point sequence. Commun. ACM 7, 12 (December 1964), 701-702.