Team:Arizona State/Project/Software

From 2011.igem.org


CRISPRstudio


ASU Logo.png
 
CRISPRstudio information panel

We have developed a tool to assist in the development of synthetic CRISPR systems.

  • Pick spacers from a source sequence, based on homology with the target genome, hairpinning potential, restriction sites, and known PAMs.
  • Output arrays with various formats based on generated spacers.
  • Display and gather CRISPR information using a database cultivated from several sources.

CRISPRstudio is provided "as is" with no express or implied warranty for accuracy or accessibility.

Downloads

  • See Source for details on running CRISPRstudio from Python.
  • Dependencies: BLAST+ 2.2.25 or compatible.

Installation

Using the provided binaries is the fastest way to get started with CRISPRstudio. Visit our GoogleCode page and select the downloads tab. Download the appropriate build based on OS. Builds are provided using cx_Freeze. All builds are provided for x86 platforms. For x64 support, CRISPRstudio must be run from source. x64 environments have not shown significant speed increases for CRISPRstudio.

Extract the compressed file to some location. Download BLAST+ 2.2.25 or compatible. BLAST+ must be installed and included in the system path. On most systems, the BLAST+ installer will take care of this. After this, run MainFrame to start the program.

For OSs not listed on our downloads page, it may still be possible for CRISPRstudio to be run. CRISPRstudio should run on any system that supports Python 2.7 or compatible and for which NCBI provides a version of BLAST+. To run on these systems, see the instructions listed below for running from source.

Source

  • The source code is available on our GoogleCode page. It is available as a package of the latest revision and directly from Subversion.

Pick OS as needed. Note that each of the python packages should list the version of Python they support - select the right one. NumPy must be installed before BioPython. The x64 release of wxPython doesn't seem to like Win7 so it may be better to use the x86 release. BLAST+ must be included in the system path. If BLAST+ is not available in the the system, the spacer creator will not function. The information browser will still function normally.

The code itself is not fully documented. However, the code was intentionally written to be readable without comments. Project files are available for PyCharm 1.6.

Run MainFrame.py to start the program.

Interface

All screen shots were taken on Windows 7, however, CRISPRstudio was designed for portability and will look similar on other operating systems.

 

Update prompt

When first launching CRISPRstudio, users are greeted with the above screen. This screen prompts the user to update the local database of CRISPR information. Press y to update or n to continue without updating. An internet connection is required. Note that we directly interface with several online databases and changes in their formats may cause our updater to fail. In this case, the data provided with CRISPRstudio may be used. Release packages of CRISPRstudio include data that was current at time of release.

 

Information panel

The genomes list is located on the left side of the screen. All genome listed are know to have some form of CRISPR. You may search for a specific genome by entering part of its name in the search bar. Logical connectives may also be used, e.g. entering lactobacillus and bl23 will return Lactobacillus casei BL23.

On the left side of the screen is information relating to that genome. We provide basic information about the organism including name (as given by NCBI), RefSeq, Taxonomy ID, and associated GenBank Accessions. For all genomes, we provide a list of arrays with their Accession. For each arrray, we provide the consensus repeat and a list of the spacers in that array.

For some genomes, those available in JCVI, we also list the CRISPR genes that known to exist in the genome. Accessions, spacers, and genes may be copied using Ctrl+C on Windows.

 

Menu

In the upper right side of the screen is the main menu bar. The view menu allows the user to switch between viewing CRISPR information and generating spacers. The settings menu controls how spacers are generated. A help menu that provides no actual is also available for piece of mind while using the software.

 

Spacer creation panel

As with the information panel, the genomes list is located on the left side of the screen.

The left side of the screen has information for specifying what how to generate spacers. For the selected genome, a list of the arrays are shown based on the repeat that is associated with them. One repeat/array must be selected. Once a repeat has been selected, the typical spacer length for that particular array is shown. The user must then enter sequence data for the target of the spacer. A raw DNA sequence may be pasted in, an NCBI Accession entered (with version number), or a FASTA file selected.

Three additional options are provided for advanced users. The user may choose to Filter for PAM, this discards all potential spacers that do not have the appropriate GG suffix. Typically almost all spacers will be discard if this option is selected. The user may also chose to Force PAM artificially, this will manually inserted the required suffix into all spacers. We also support a basic form of checking for hairpinning of spacers. Depending on the use of the spacer data, it may be more effective to use another tool to check for the existence of hairpinning.

After selecting the required settings, the user must click Compute Spacers. This process may take some time depending on the length of the input sequence. An internet connection may be required. The genome and other sequence information will be automatically downloaded if it does not already exist.

We use the following basic process for determining spacers:

  • Start with all possible spacers based on the required length
  • Filter any with banned restriction sites
  • Filter those that have homology with the repeat consensus
  • Filter those that hairpin with more than 8 bases (optional)
  • Filter those that have homology with the host genome
  • Genome is downloaded from NCBI
  • We run a local BLAST search
  • Users may have better results if they adjust the EValue (see ArrayGeneration.py) for determining homology
  • Filter based on PAMs (optional)

 

Finalize Spacers

Once CRISPRstudio is done computing the possible spacers, the spacers must be finalized. A list is displayed containing the possible spacers. The user may select how to lay out the sequence: repeat-spacer or repeat-spacer-repeat. The user may also choose to package the spacer using the standard Biobrick prefix and suffix. The final result is displayed in a text box at the bottom.

 

Spacer Generation Settings

The Spacer Generation Settings dialog controls which REs are considered when creating spacers. The spacer creator requires that a set of REs does not exist within potential spacers. The set of REs is defined here. On the left is a list of currently selected REs; these may be removed by selecting an item and clicking Remove Site. On the right is a list of all known REs in BioPython. Any of these may be added by selecting on it and clicking Add Site. Alternatively, the name of a RE (e.g. EcoRI) may be entered in the text box and added by clicking the button. The selected REs are stored locally and will be loaded next time CRISPRstudio is run.

Data Sources