Team:MIT/Tools/

From 2011.igem.org

(Difference between revisions)
Line 28: Line 28:
<body>
<body>
 +
<style>
 +
#catlinks{visibility: none}
 +
</style>
<!-- NAV -->
<!-- NAV -->
<div id="col_nav">
<div id="col_nav">

Revision as of 03:13, 29 September 2011

Navigation

  • mcell Modeling Platform
  • G-Level Assembly
  • Geneious

mcell - A Multicellular Modeling Framework

mcell is a small set of Python classes that allows the enterprising modeler to:

  • Easily create very flexible models of of multicellular dynamics
  • Manage the models already created through a simple command-line interface
  • Easily change defined parameters in models
  • Render the runs of the models in a convenient, simple way
  • Analyze the runs of the models through histograms

mcell is a thin layer on top of the combination of BionetSolver, which model internal cell states as systems of ODEs, and CompuCell3D, which models cell shape dynamics using the Glazier-Graner-Hogeweg methos. It was created largely in reaction to inconveniences experienced when using BionetSolver and CompuCell3D as detailed here. It is in continuous development; we would like to continue evolving it outside the bounds of this competition. (Information about further development of mcell will be posted on http://igem.mit.edu/mcell. We encourage you to read through the code and look at our modeling data as examples to understand the framework.

Quick Links

Why?

Over the course of the summer, we tried several ways of modeling multicellular dynamics. The first thing we tried was everyone's trusted favorite, MATLAB. We generated a system of several thousand ODEs, in which groups of 5 ODEs represented the state of a single cell, and were coupled to ODEs describing adjacent cells. Unfortunately, this was very, very slow, and did not allow for the modeling of cell shape dynamics.

Then, we discovered a way to couple CompuCell3D with ODE models of internal cell states. We realized that, if we wanted to do any sort of reasonably flexible modeling, we needed to do something like this. So, we downloaded CompuCell3D, read the documentation, and started modeling various circuits.

Now, CompuCell3D coupled with BionetSolver is a great system. However, we quickly discovered that this wasn't quite enough for our purposes. For instance, CompuCell3D allows one to dynamically change the "cell type", and this the cell color. However, as we were trying to engineer cells, we were interested in the dynamics of the protein expression levels -- in fundamentally continuous quantities -- rather than a discrete notion of "cell type"! CompuCell 3D would not let us visualize the values that we needed -- so we wrote a Python script to render the models the way we wanted to.

Other problems emerged. We wanted to be able to analyze the data in other ways -- in particular, we wanted to see histograms of the levels of expression of some protein. In addition, we realized that trying to manage all our models was becoming quite inconvenient without a unified organization scheme.

So, we had several hastily-written, badly organized Python scripts and many, many folders scattered around. We then decided that this problem needs fixing, and created a small set of classes to unify the object model do the file management, rendering, and analysis, for us -- using a convenient and extensible command-line interface. We found this functionality to be tremndously useful for our purposes, and thus, we are sharing mcell with the world.

Installation

Dependencies: Python 2.5.6, CompuCell3D 3.5.0, BionetSolver 1.0.0 for Python 2.5, matplotlib, PIL, and hcluster.

Also very useful: Synthetic Biology Workbench (for defining circuits)

Download mcell from here. Unzip the files somewhere into your PATH. Open mcell.py, and edit the variables between #START CONFIG and #END CONFIG. (They have self-explanatory names, and their functions are explained in the file.) In hook.py.template, edit the commented line to reflect the location of mcell. Place the Template folder into you modeling directory for an empty template model. The other folders are also interesting models that we have tried out. (The params files in the model folders describe the parameters the model requires to run.)

How it works

In mcell models, BionetSolver models the internal cell states, and CompuCell3D models the cell dynamics.

BionetSolver

BionetSolver reads one or several SBML files, each of which define a circuit as a system of chemical reactions in several containers with given rate laws. SBML models are simply XML files, and can be written with a text editor; however, it is much easier to define them using a graphical designer, like JDesigner, or using a simple scripting language, like [Jarnac]. (Both of those editors can be obtained by installing the Synthetic Biology Workbench.

After loading the circuits, BionetSolver is in posession of a system of ODEs that define the internal state of each modeled cell. To simulate each cell, BionetSolver simply Euler-steps its ODE system forward with a fixed time step. Some of the variables in the ODE model (like, say, the concentration of a protein on the neighboring cells) are actually parameters that are continually updated from the CompuCell3D thread.

More documentation on BionetSolver can be found by downloading it from the link above.

CompuCell3D

CompuCell3D uses the Glazier-Graner-Hogeweg model to simulate cell shape dynamics. The playing field is a matrix. An entry of the matrix corresponds to an ID of the cell occupying that spot on the playing field. Thus, a cell is represented as a set of points on the matrix. The medium the cell are in is represented by a single large cell occupying all the "empty space".

To iterate the model forward, CompuCell3D chooses a random point on the matrix -- the invading point -- and a random adjacent point -- the receiving point. If the two points are from different cells, then the receiving point's cell id becomes set to the invading point's cell id with some probability e^-(T*k(state)), where T is set in the configuration file and k(state) is a function of the state of the system that is determined by the plugins defined in the configuration file. (For example, the Contact plugin adds a term to k that makes cells of one cell type want to stick or not stick to cells of another cell type. The Surface plugin adds a term that grows larger as the surface area of the cell deviates from a defined amount.) This procedure is called a Monte Carlo step (MCS). Certain parameters of the plugins -- say, the type, or the adhesiveness of a cell -- can be continuously updated to reflect the internal state of the cell.

Coupling BionetSolver and CompuCell3D

CompuCell3D provides a mechanism to script its function called a Steppable. Thus, to couple the internal states of the cells and the cell shape dynamics, a steppable called ModelSteppable is used. ModelSteppable initializes BionetSolver with circuits before CompuCell starts running its model, but after it has created the initial state of the cells. Then, CompuCell3D takes several MCSs to update the cell shape state. ModelSteppable then loops over the cells, updating the internal states of the cells in the BionetSolver circuits to match the external states, and the external state parameters in CompuCell3D to match the internal states of the cells. The full internal state every cell is printed to one file, and the state of the cell shapes is printed to another file. This cycle then repeats many, many times.

Management of Models

Directory Structure

mcell uses a directory structure to manage its models. All models are contained in a modeling directory, the location of which is configurable in mcell.py. Every folder in the modeling directory is considered by mcell to be a model. A model folder contains 3 necessary files: a CompuCell3D configuration file (cc3d.xml), a hook file(hook.py), and a file containing ModelSteppable (steppable.py). A model folder also contains files that start with "circuit_", which are all assumed to be SBML models of circuits. These files define a model.

All folders inside the model folder are considered to be runs of the model; the name of the folder is the name of the run. Inside the run folder, the LatticeData folder contains vtk files that describe the state of the cell shapes at every frame of the model; the CellData.txt file which describes the complete state of every cell for every frame, one frame per line; and the model folder, which contains the final model files with parameters substituted into them. (To run a model, mcell actually loads the model files, adds in the given parameters, writes them to the model folder, and runs CompuCell3D on cc3d.xml in the model folder.)

Parameters

Each of the model files can contain strings in the form "%(something)s"; when the model is run with model.run("NewRun", something=2), the parameter 2 will be substituted in place of that string for the duration of the run.

Easy definition of a model

here are several utility functions in mcell_model_utils.py that allow easier definition of the coupling between internal and external cell states. Note: mcell_model_utils.py should be imported into every model's steppable inside the ModelSteppable class definition for them to work. All functions expect a setup for ModelSteppable as in the Template model.

self.setToWeightedValuesOf("a", "b") sets protein "a" in all cells to the value of "b" in the neighboring cells, weighted by contact area. This is used to easily set up a model of surface proteins interacting on adjacent cells.

self.updateAdhesionMolecule("a", "b") sets the amount of adhesion molecule "a" in CompuCell to the amount of protein "b" for every cell. This is used in conjunction with the AdhesionFlex plugin; it expects self.adhesionFlexPlugin = CompuCell.getAdhesionFlexPlugin() in the initialization of a steppable.

Command Line Management

To run mcell, run python and type from mcell import *. This initializes a models dictionary containing all the models in your modeling directory. Each model is a fully functional python object! If we have a model called Test, then:

  • models['Test'].runs is a dictionary containing all the runs of the object (which are also Python objects)
  • models['Test'].cc3d_config contains a python object that represents the cc3d.xml for the model. running models['Test'].cc3d_config.edit() would open your favorite text editor to edit the config, and running models['Test'].cc3d_config.copy(dest) will copy the CC3D configuration to dest. The same goes for the hook file(.hook) and the ModelSteppable file(.steppable).
  • models['Test'].circuits is an array containing python objects like described above that reference the circuits in the model.
  • models['Test'].clone('NewModel') creates a new model identical to Test called NewModel. This is incredibly useful for creating models that are similar to other models, something done quite often. Cloning the Template model creates a fairly good setup to begin defining a model.
  • models['Test'].run('NewRun', params) runs Test with parameters params (a dictionary) and creates the NewRun run. Parameters for a model are defined as described in the Parameters section above.
  • models['Test'].destroy() irrevocably destroys the model.

Rendering and Analysis of Runs

If r is a Run object (taken from something like r = models['Test'].runs['NewRun']), then:

  • r.load() will load the run data into memory. This is time and memory intensive, as, unfortunatly, multicellular runs entail a lot of data.
  • r.render(species="Reporter", unload=False, pix_size=5) will render the model into movie.avi in the run's folder, coloring the cells proportional to the amount or "Reporter" protein that they express. The data for the model will not be unloaded after the run, and the pixel size of a singly CompuCell3D matrix entry is 5. All these arguments are optional and default to the values given here.
  • r.render_hist_movie(species="Reporter") will render a movie of the histograms of the values of the "Reporter" protein in all the cells as the simulation evolves. The argument is optional and "Reporter" is the default. The name of the movie will be histograms.avi.
  • r.frames is an array that, after loading the model, contains Frame objects that contain the raw data for every frame of the simulation. Specifically, r.frames[0].cell_states[2]['Reporter'] contains the value of the "Reporter" protein in cell with id 2 during the first frame. r.frames[0].cell_types, r.frames[0].cell_ids, and r.frames[0].cell_clusters are 2-dimensional arrays containing the cell types, the cell ids, and the cell clusters on the CompuCell3D "playing field" for that frame.

G-Level Assembly

This summer, we relied on three levels of G-Level Assembly: Gibson, Golden Gate, and Gateway. These three techniques allowed us to assemble together fusion proteins of many parts of DNA in one-pot reactions.

Gateway

The Gateway cloning system is available from Invitrogen. It is a fast and reliable way to create expression vectors for mammalian cells. It fulfills the same function as restriction cloning in the Biobrick standardization, allowing us to combine vectors with different parts to create a whole circuit. But the actual mechanism is vastly different from restriction cloning. Gateway uses recombination enzymes to combine multiple vectors, a one-step process that avoids the laborious digestion and ligation steps involved in restriction cloning.


An example LR reaction.


An example BP reaction.


Cloning Process Overview
We start out by cloning all the genes and promoters needed into pENTR vectors; the pENTR vectors contain restriction and recombination sites, so either cloning method can be used to insert the target DNA into the vectors. The next step is to combine pENTR vectors containing the relevant gene and promoter with a pDEST vector containing both prokaryotic and eukaryotic origins of replication. Gateway cloning allows us to avoid laborious digestion and ligation steps in favor of a faster, more efficient method. To obtain the expression vector, we combine all three plasmids in a recombination reaction; the step takes 12-16 hours total and yields reliable products that can be further verified through restriction digests and sequencing.

Golden Gate

Golden Gate assembly is a one-pot method using restriction endonucleases that have a different cutting site than recognition site to assemble many parts of DNA together. It was developed in 2008 by Engler et al and is available here. Using a directional restriction enzyme, which does not cut at the recognition site such as BsaI allows us to cut out parts that have specific overhangs with each other after cutting. Ligase is also in the reaction, which will ligate parts together; if it ligates a cutting sequence back together, these will simply be cut again. By the end, all the recognition sites will have be excised, leaving only the complete part. This summer, we used Golden Gate assembly for Part:BBa_K511400 and Part:BBa_K511500.

Image taken from http://j5.jbei.org/j5manual/pages/23.html
Process Overview
A Golden Gate reaction can be set up via the following protocol. Take special care to use equimolar concentrations for each DNA fragment and high concentration ligase.
  • 50 fmol of each DNA fragment
  • 1 uL of Promega High Concentration T4 Ligase
  • 1 uL of NEB BsaI
  • 1 uL of Promega T4 Ligase Buffer
  • Water to 15 uL
Cycle at:
  • (5 minutes at 37 C, 5 minutes at 18 C) x 50
  • 10 minutes at 50 C
  • 10 minutes at 80 C

Gibson

Gibson assembly is a high-efficiency scarless method of assembling multiple large pieces of DNA at once without the need for restriction enzymes. It was developed in 2009 by Gibson et al and is available here. It uses an exonuclease to chew back the 5' overhang until the DNA parts hydrogen bond, a polymerase and ligase to replace and ligate the missing nucleotides. Enzyme activity is controlled through temperature cycling. This summer, we successfully used Gibson assembly to create some of our parts, such as Part:BBa_K511401.

Process Overview
Primers are first designed to PCR on ~20bp overhangs onto each of the DNA parts to be assembled together. After this product is cleaned up, equimolar concentrations of DNA along with a Master Mix of enzymes are run for an hour at 50C, resulting in a product that can be immediately transformed into cells for propagation.

Geneious - A Bioinformatics Suite for Synthetic Biology

This year, we assembled dozens of unique expression vectors from dozens of entry and destination vectors, some of which we assembled via Gibson or Golden Gate reactions. In order to keep track of this large volume of plasmids, primers, and sequencing results, we heavily relied on the Geneious bioinformatics suite. Geneious offers tools from primer design to virtual gel electrophoresis, excellent file sharing for collaborations, automatic in silico assembly of multi site LR reactions, and more. Geneious was a powerful tool for everything from analyzing our sequencing results to assembling fusion plasmids, and we are excited to have Geneious' host company, Biomatters Ltd., as a sponsor. Here are some screenshots of Geneious in action.