CRISPR-detector provides a web-hosted platform (https://db.cngb.org/crispr-detector/) and local deployable pipeline to fast and accurately identify and annotate editing-induced mutations from genome editing assays.
- optimized scalability allowing for whole genome sequencing data analysis beyond BED file-defined regions;
- improved accuracy benefited from haplotype based variant calling to handle sequencing errors;
- treated and control sample co-analysis to remove background variants existing prior to genome editing;
- integrated structural variation (SV) calling with additional focus on vector insertions from viral-mediated genome editing;
- functional and clinical annotation of editing-induced mutations.
Download Sentieon toolkit from
https://s3.amazonaws.com/sentieon-release/software/sentieon-genomics-202010.03.tar.gz
You may request a license by sending emails to frank.hu@sentieon.com
export SENTIEON_LICENSE=PATH_TO_SENTIEON/sentieon-genomics-202010.03/localhost_eval.lic
export PATH=PATH_TO_SENTIEON/sentieon-genomics-202010.03/bin:$PATH
pip install biopython
pip install pyfaidx
pip install -U textwrap3
conda install blast
conda install samtools
Download ANNOVAR from https://www.openbioinformatics.org/annovar/annovar_download_form.php
perl annotate_variation.pl -downdb -webfrom annovar avdblist humandb/ -buildver hg38
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar clinvar_20210501 humandb/
export PATH=PATH_TO_ANNOVAR/annovar:$PATH
Organism Homo sapiens experiment type sequencing data support variant annotations from refGene & ClinVar, other species may only support refGene annotations
You may build ANNOVAR database yourself for any species with corresponding genome assembly and gff3 format files
For example, to build a database for zebrafish. Download GRCz11.fa and GRCz11.gff3 from public database.
Then running commands as following:
conda install -c bioconda/label/cf201901 gffread
conda install -c bioconda/label/cf201901 ucsc-gtftogenepred
conda install -c bioconda/label/cf201901 blast
cd PATH_TO_ANNOVAR/
mkdir zebrafishdb && cd zebrafishdb
mv */GRCz11.fa zebrafishdb
mv */GRCz11.gff3 zebrafishdb
gffread GRCz11.gff3 -T -o GRCz11.gtf
gtfToGenePred -genePredExt GRCz11.gtf GRCz11_refGene.txt
retrieve_seq_from_fasta.pl --format refGene --seqfile GRCz11.fa GRCz11_refGene.txt --out GRCz11_refGeneMrna.fa
makeblastdb -in GRCz11.fa -dbtype nucl
python CRISPRdetectorAMP.py | CRISPRdetectorBE.py | CRISPRdetectorWGS.py | CRISPRdetectorVEC.py
--sample: sample name & output directory name [required]
--e1: treatment group fq1 path [required]
--e2: treatment group fq2 path [optional]
--c1: control group fq2 path [optional]
--c2: control group fq2 path [optional]
--o: output path [default:'.']
--threads: number of threads to run sentieon minimap2 & driver module [default:1]
--min_allele_frac: the minimum allelic fraction in treated sample [default:0.005]
--max_fisher_pv_active: the maximum pvalue of the statistical difference between treated and untreated sample [default:0.05]
python scripts/CRISPRdetectorAMP.py
--amplicons_file: a tab-delimited text amplicons description file with up to 3 columns: AMPLICON_NAME, AMPLICON_SEQ, gRNA_SEQ_without_PAM(optional) [required]
--anno: annotate variants with ANNOVAR or not [optional]
--assembly: assembly version, hg19,hg38 ... [optional]
--db: ANNOVAR database path [optional]
--ClinVar: only organism homo sapiens experiment type sequencing data support variant annotations from ClinVar [default:0]
--cleavage_offset: center of quantification window to use within respect to the 3-end of the provided sgRNA sequence [default:-3]
--window_size: defines the size (in bp) of the quantification window extending from the position specified by the cleavage_offset parameter in relation to the provided guide RNA sequence, 0 means whole amplicon analysis [default:0]
--ignore_substitutions: enable substitutions evaluation [default:0]
--min_num_of_reads: the minimum number of reads (per locus site) to evaluate [default:500]
python scripts/CRISPRdetectorBE.py
--amplicons_file: a tab-delimited text amplicons description file with up to 3 columns: AMPLICON_NAME, AMPLICON_SEQ, gRNA_SEQ_without_PAM(optional) [required]
--anno: annotate variants with ANNOVAR or not [optional]
--assembly: assembly version, hg19,hg38 ... [optional]
--db: ANNOVAR database path [optional]
--ClinVar: only organism homo sapiens experiment type sequencing data support variant annotations from ClinVar [default:0]
--cleavage_offset: center of quantification window to use within respect to the 3-end of the provided sgRNA sequence [default:-3]
--window_size: defines the size (in bp) of the quantification window extending from the position specified by the cleavage_offset parameter in relation to the provided guide RNA sequence, 0 means whole amplicon analysis [default:0]
--min_num_of_reads: the minimum number of reads (per locus site) to evaluate [default:500]
python scripts/CRISPRdetectorWGS.py
--bed: BED format file input to call variants of interested regions [optional]
--assembly: path to assembly in FASTA format : hg38.fa mm9.fa ... [required]
python scripts/CRISPRdetectorVEC.py
--bed: BED format file input to call variants of interested regions [optional]
--vector: path to vector genome in FASTA format [required]
--assembly: path to assembly in FASTA format : hg38.fa mm9.fa ... [required]
python scripts/CRISPRdetectorPlot.py
--sample: sample name & output directory name [required]
--o: output path [default:'.']
--dpi: the resolution in dots per inch [default:1800]
CRISPR-Detector: Fast and Accurate Detection, Visualization, and Annotation of Genome Wide Mutations Induced by Gene Editing Events
Lei Huang, Dan Wang, Haodong Chen, Jinnan Hu, Xuechen Dai, Chuan Liu, Anduo Li, Xuechun Shen, Chen Qi, Haixi Sun, Dengwei Zhang, Tong Chen, Yuan Jiang
bioRxiv 2022.02.16.480781; doi: https://doi.org/10.1101/2022.02.16.480781