See glossary for explanation of various abbreviations used on this page.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a genomic feature of many prokaryotic and archaeal species. 40% of sequenced bacterial genomes and 90% of archaeal genomes contain at least one CRISPR array. It is possible that many laboratory strains of bacteria, which are the sources of many available genome sequences, have lost CRISPR due to a lack of exposure to phages.
CRISPR functions as an adaptive and inheritable immune system. A CRISPR locus consists of a set of Cas (CRISPR associated) genes, a leader, or promoter, sequence, and an array. This array consists of repeating elements along with "spacers". These spacer regions direct the CRISPR machinery to degrade or otherwise inactivate a complementary sequence in the cell.
The CRISPR array
Genetic information from previous encounters is stored in the array as spacers. These spacers are consistent in length (30-40 bp), and are flanked by repeating elements (also 30-40 bp). The repeating elements are usually partially palindromic, and form secondary structures when transcribed into pre-crRNA. These structures may be necessary for recognition and cleavage.
By engineering a spacer complementary to T3 phage, increased survival was demonstrated. A customized spacer can prevent transformation of PC194 plasmids with a matching sequence.
CRISPR in Escherichia coli K-12 substr. MG1655
E. coli contains a type I CRISPR system. There are four CRISPR loci in this organism. CRISPR1, the largest, is associated with eight Cas genes. In the classification scheme presented by Haft et al, these genes form the Cse family: casA, casB, casC, casD, casE, aka cse1, cse2, cse3, cse4, cas5e. These 5 proteins combine to form the Cascade complex. This is a protein complex of all 5 Cse genes, resembling a seahorse in shape. Its full composition is 1x casA, 2x casB, 6x casC, 1x casD, 1x casE. Specifically, casE cleaves pre-crRNA, and casA and casB can be omitted without affecting crRNA generation, but are necessary for phage resistance. This complex binds double stranded target DNA without need or enhancement by cofactors such as metal ions or ATP. It also undergoes conformational changes when binding DNA.
Structure of the CRISPR I locus in E. coli. 3 promoters have been characterized: Pcrispr1, Pcas, and anti-Pcas.
CRISPR in Pyrococcus furiosus DSM 3638
This organism is notable due to the diversity of its Cas genes, as well as its possible RNA targeting. P. furiosus contains 7 CRISPR loci, along with 29 Cas genes in 2 gene clusters. All 6 core Cas genes (cas1-cas6), as well as genes from the Cmr (type III), Cst (type I), and Csa (type I) families are present. Cmr1-6 have been found to form a Cascade-like complex that targets RNA in in-vitro experiments.
CRISPR in Bacillus halodurans C-125
B. halodurans contains 6 Cmr genes (Cmr1-6) in a single locus. This is a type III CRISPR system. The organism also contains Csd1 and Csd2 (Dvulg subtype I-C) along with Cas1, Cas2, Cas3, Cas4, and Cas5 in another locus.
CRISPR in Listeria innocua Clip11262
L. innocua contains a type II CRISPR system. A single gene (Cas9 / Csn1) has been shown to be necessary for the expression and inactivation stages of the pathway. A separate trans-encoded small RNA (tracrRNA) binds with the repeat segment of the pre-crRNA, followed by cleavage by RNase III and binding with Cas9.
Stages of the CRISPR pathway
Integration / Adaptation
In this step, DNA, commonly derived from phages and plasmids, is recognized and processed by Cas proteins. Information from outside of the genome is recognized and incorporated into the leader end of an existing array. This involves cas1 and cas2. The integration stage is currently the least understood aspect of the pathway.
In the expression stage, the CRISPR array is transcribed in its entirety, yielding pre-crRNA. This pre-crRNA is cleaved at repeat regions to yield crRNA. In E. coli, this crRNA is 61 bp long, consisting of a 31 bp spacer, flanked by repeat-derived segments on both ends (8 bp at 5', 21 bp forming a hairpin at 3', with a 5' hydroxyl group). crRNA is then typically bound to a protein complex (known as Cascade in E. coli).
This stage requires bound crRNA, as well as Cas3 in E. coli. The interference stage targets DNA in most organisms, but RNA targeting has been demonstrated in the case of P. furiosus. Recognition of target DNA is thought to take place by means of R-loops. An R-loop is an RNA strand that has base paired with a complementary DNA strand, displacing the other identical DNA strand. This base pairing between the crRNA spacer sequence and target strand may mark the region for interference by other proteins such as cas3.
In Streptococcus thermophilus, only Cas9 is necessary for CRISPR functionality. However, a specific sequence, known as a proto-adjacent-motif (PAM) was found to be required for interference. The predicted sequence is 5'-NGGNG-3'. This sequence is found several base pairs upstream of the proto-spacer (target DNA). Single base pair mutations in the PAM completely abolish CRISPR interference.
Core Cas genes
There are 6 “core” Cas genes, found in a wide variety of organisms and here referred to as Cas1-Cas6.
Cas1 is nearly universally conserved throughout organisms with CRISPR. It is strongly implicated in the integration stage of the pathway. Cas1 is a metal-dependent (Mg, Mn) DNA-specific endonuclease that generates an 80 bp fragment. How this is converted into an ~32 bp spacer is unknown.
Cas3 is not regulated by H-NS. It cooperates with the Cascade complex in the interference stage. Cas3 has predicted ATP-dependent helicase activity, as well as demonstrated ATP independent annealing of RNA to DNA. It forms an R-loop with DNA, requiring magnesium or manganese as a co-factor, but has an antagonistic function in the presence of ATP, dissociating the R-loop.
Prevention of self targeting (autoimmunity)
The 5' handle of crRNA allows self / nonself discrimination in the csm subtype. In the Cse subtype, regions flanking the proto spacer contain PAMs, which may be necessary for interference. In general, it is thought that mismatches at positions outside of the spacer sequence allow for targeting, while extended base pairing with the surrounding repeats prevents targeting.
In E. coli (Cse subtype), transcription of the Cascade genes and CRISPR array is repressed by H-NS. H-NS is a global repressor of transcription in many gram negative bacteria that binds AT rich sequences. This repression is mediated by "DNA stiffening", as well as formation of "DNA-protein-DNA" bridges. The creation of an H-NS knockout can be shown to increase expression of cas genes. This correlates with phage sensitivity.
Transcription is antagonistically de-repressed by LeuO, a protein of the lysR transcription factor family near the leuABCD (leucine synthesis) operon. LeuO expression is also repressed by H-NS. Expression of H-NS repressed proteins can be manipulated by plasmid-encoded leuO in a constitutive promoter. Plasmids: pCA24N (lac1 promoter), pKEDR13 (pTac promoter), pNH41 (IPTG). Increased LeuO expression leads to increased expression of casABCDE, cas1, and cas2, but does not affect cas3 expression. Constitutively expressing leuO had a stronger affect than knocking out H-NS.
Classification of CRISPR systems
For a comprehensive listing of Cas genes, see .
Haft (2005) : Recognition of core Cas genes (1-6). Organized remaining genes into 9 subtypes: Ecoli, Ypest, Nmeni, Dvulg, Tneap, Hmari, Apern, Mtube, RAMP.
Makarova (2011) : Classification into I, II, and III subtypes, based on mechanism of action as well as homology. These subtypes correspond with the 9 given by Haft to a large extent:
Type I, II and III systems
This classification takes into account differing mechanisms at all three stages of the pathway.