Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Overview

Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Cashier is a tool developed by Russell Durrett for the analysis and extraction of expressed barcode tags.

This python implementation offers the same flexibility and simple command line operation.

Like it's predecessor it is a wrapper for the tools cutadapt, fastx-toolkit, and starcode.

Dependencies

  • cutadapt (sequence extraction)
  • starcode (sequence clustering)
  • fastx-toolkit (PHred score filtering)
  • pear (paired end read merging)
  • pysam (sam file convertion to fastq)

Recommended Installation Procedure

It's recommended to use conda to install and manage the dependencies for this package

conda env create -f https://raw.githubusercontent.com/brocklab/pycashier/main/environment.yml # or mamba env create -f ....
conda activate cashierenv
pycashier --help

Additionally you may install with pip. Though it will be up to you to ensure all the non-python dependencies are on the path and installed correctly.

pip install pycashier

Usage

Pycashier has one required argument which is the directory containing the fastq or sam files you wish to process.

conda activate cashierenv
pycashier ./fastqs

For additional parameters see pycashier -h.

As the files are processed two additional directories will be created pipeline and outs.

Currently all intermediary files generated as a result of the program will be found in pipeline.

While the final processed files will be found within the outs directory.

Merging Files

Pycashier can now take paired end reads and perform a merging of the reads to produce a fastq which can then be used with cashier's default feature.

pycashier ./fastqs -m

Processing Barcodes from 10X bam files

Pycashier can also extract gRNA barcodes along with 10X cell and umi barcodes.

Firstly we are only interested in the unmapped reads. From the cellranger bam output you would obtain these reads using samtools.

samtools view -f 4 possorted_genome_bam.bam > unmapped.sam

Then similar to normal barcode extraction you can pass a directory of these unmapped sam files to pycashier and extract barcodes. You can also still specify extraction parameters that will be passed to cutadapt as usual.

Note: The default parameters passed to cutadapt are unlinked adapters and minimum barcode length of 10 bp.

pycashier ./unmapped_sams -sc

When finished the outs directory will have a .tsv containing the following columns: Illumina Read Info, UMI Barcode, Cell Barcode, gRNA Barcode

Usage notes

Pycashier will NOT overwrite intermediary files. If there is an issue in the process, please delete either the pipeline directory or the requisite intermediary files for the sample you wish to reprocess. This will allow the user to place new fastqs within the source directory or a project folder without reprocessing all samples each time.

  • Currently, pycashier expects to find .fastq.gz files when merging and .fastq files when extracting barcodes. This behavior may change in the future.
  • If there are reads from multiple lanes they should first be concatenated with cat sample*R1*.fastq.gz > sample.R1.fastq.gz
  • Naming conventions:
    • Sample names are extracted from files using the first string delimited with a period. Please take this into account when naming sam or fastq files.
    • Each processing step will append information to the input file name to indicate changes, again delimited with periods.
PyWorkflow(PyWF) - A Python Binding of C++ Workflow

PyWorkflow(PyWF) - A Python Binding of C++ Workflow 概览 C++ Workflow是一个高性能的异步引擎,本项目着力于实现一个Python版的Workflow,让Python用户也能享受Workflow带来的绝佳体验。

Sogou-inc 108 Dec 01, 2022
A domonic-like wrapper around selectolax

A domonic-like wrapper around selectolax

byteface 3 Jun 23, 2022
Web UI for your scripts with execution management

Script-server is a Web UI for scripts. As an administrator, you add your existing scripts into Script server and other users would be ab

Iaroslav Shepilov 1.1k Jan 09, 2023
Script de monitoramento de telemetria para missões espaciais, cansat e foguetemodelismo.

Aeroespace_GroundStation Script de monitoramento de telemetria para missões espaciais, cansat e foguetemodelismo. Imagem 1 - Dashboard realizando moni

Vinícius Azevedo 5 Nov 27, 2022
This is the core of the program which takes 5k SYMBOLS and looks back N years to pull in the daily OHLC data of those symbols and saves them to disc.

This is the core of the program which takes 5k SYMBOLS and looks back N years to pull in the daily OHLC data of those symbols and saves them to disc.

Daniel Caine 1 Jan 31, 2022
Fofa asset consolidation script

资产收集+C段整理二合一 基于fofa资产搜索引擎进行资产收集,快速检索目标条件下的IP,URL以及标题,适用于资产较多时对模糊资产的快速检索,新增C段整理功能,整理出

白泽Sec安全实验室 36 Dec 01, 2022
Small projects for python beginners.

Python Mini Projects For Beginners I recently started doing the #100DaysOfCode Challenge in Python. I've used Python before, but I had switched to JS

Sreekesh Iyer 10 Dec 12, 2022
Cylc: a workflow engine for cycling systems

Cylc: a workflow engine for cycling systems. Repository master branch: core meta-scheduler component of cylc-8 (in development); Repository 7.8.x branch: full cylc-7 system.

The Cylc Workflow Engine 205 Dec 20, 2022
An app that mirrors your phone to your compute and maps controller input to the screen

What is 'Dragalia Control'? An app that mirrors your phone to your compute and maps controller input to the screen. Inputs are mapped specifically for

1 May 03, 2022
Turn crypto miner on/off depending on powerwall charge level

Mining Crypto with Tesla Solar and Powerwalls This script turns a crypto miner on and off when the Tesla Powerwall level drops/rises above a certain t

Matt 1 Nov 09, 2021
Import Apex legends mprt files exported from Legion

Apex-mprt-importer-for-Blender Import Apex legends mprt files exported from Legion. REQUIRES CAST IMPORTER Usage: Use a VPK extracter to extract the m

15 Dec 18, 2022
emoji-math computes the given python expression and returns either the value or the nearest 5 emojis as measured by cosine similarity.

emoji-math computes the given python expression and returns either the value or the nearest 5 emojis as measured by cosine similarity.

Andrew White 13 Dec 11, 2022
Generate Azure Blob Storage account authentication headers for Munki

Azure Blob Storage Authentication for Munki The Azure Blob Storage Middleware allows munki clients to connect securely, and directly to a munki repo h

Oliver Kieselbach 10 Apr 12, 2022
DRF magic links

drf-magic-links Installation pip install drf-magic-links Add URL patterns # urls.py

Dmitry Kalinin 1 Nov 07, 2021
Emulate and Dissect MSF and *other* attacks

Need help in analyzing Windows shellcode or attack coming from Metasploit Framework or Cobalt Strike (or may be also other malicious or obfuscated code)? Do you need to automate tasks with simple scr

123 Dec 16, 2022
A project to empower needy-students.

Happy Project 😊 A project to empower needy-students. Happy Project is a non-profit initiation founded by IT people from Jaffna, Sri Lanka. This is to

1 Mar 14, 2022
An alternative app for core Armoury Crate functions.

NoROG DISCLAIMER: Use at your own risk. This is alpha-quality software. It has not been extensively tested, though I personally run it daily on my lap

12 Nov 29, 2022
Frappe app for authentication, can be used with FrappeVue-AdminLTE

Frappeauth App Frappe app for authentication, can be used with FrappeVue-AdminLTE

Anthony C. Emmanuel 9 Apr 13, 2022
This is an implementation of NeuronJ work with python.

NeuronJ This is an implementation of NeuronJ work with python. NeuronJ is a plug-in for ImageJ that allows you to create and edit neurons masks. Image

Mohammad Mahdi Samei 3 Aug 28, 2022
A minimal configuration for a dockerized kafka project.

Docker Kafka Quickstart A minimal configuration for a dockerized kafka project. Usage: Run this command to build kafka and zookeeper containers, and c

Nouamane Tazi 5 Jan 12, 2022