HiFi DeepVariant + WhatsHap workflow

Workflow steps

align HiFi reads to reference with pbmm2
call small variants with DeepVariant, using two-pass method (DeepVariant ➡️ WhatsHap phase ➡️ WhatsHap haplotag ➡️ DeepVariant)
phase small variants with WhatsHap
haplotag aligned BAMs with WhatsHap and merge

Directory structure within basedir

.
├── cluster_logs  # slurm stderr/stdout logs
├── reference
│   ├── reference.chr_lengths.txt  # cut -f1,2 reference.fasta > reference.chr_lengths.txt
│   ├── reference.fasta
│   └── reference.fasta.fai
├── samples
│   └── 
   
      # sample_id regex: r'[A-Za-z0-9_-]+'
│       ├── whatshap/  # phased small variants; merged haplotagged alignments
│       ├── logs/  # per-rule stdout/stderr logs
│       ├── aligned/  # intermediate
│       ├── deepvariant/  # intermediate
│       ├── deepvariant_intermediate/  # intermediate
│       └── whatshap_intermediate/  # intermediate
├── smrtcells
│   ├── done  # move folders from smrtcells/ready to smrtcells/done to prevent re-processing
│   └── ready
│       └── 
    
       # uBAMs or FASTQs per sample
│                        # filename regex: r'm\d{5}[Ue]?_\d{6}_\d{6}).(ccs|hifi_reads).bam' or r'm\d{5}[Ue]?_\d{6}_\d{6}).fastq.gz'
└── workflow  # clone of this repo

To run the pipeline

$ conda create \
    --channel bioconda \
    --channel conda-forge \
    --prefix ./conda_env \
    python=3 snakemake mamba lockfile

$ conda activate ./conda_env

$ sbatch workflow/run_snakemake.sh <sample_id>

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Related tags

Overview

HiFi DeepVariant + WhatsHap workflow

Workflow steps

Directory structure within basedir

To run the pipeline

Owner

William Rowell

ConvBERT: Improving BERT with Span-based Dynamic Convolution

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Image2pcl - Enter the metaverse with 2D image to 3D projections

SGMC: Spectral Graph Matrix Completion

Transformer training code for sequential tasks

Tools and data for measuring the popularity & growth of various programming languages.

Global Rhythm Style Transfer Without Text Transcriptions

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Exploration of BERT-based models on twitter sentiment classifications

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Shirt Bot is a discord bot which uses GPT-3 to generate text

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Few-shot Natural Language Generation for Task-Oriented Dialog

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

NeMo: a toolkit for conversational AI

A framework for implementing federated learning

Legal text retrieval for python

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.