Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Overview

Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Cashier is a tool developed by Russell Durrett for the analysis and extraction of expressed barcode tags.

This python implementation offers the same flexibility and simple command line operation.

Like it's predecessor it is a wrapper for the tools cutadapt, fastx-toolkit, and starcode.

Dependencies

  • cutadapt (sequence extraction)
  • starcode (sequence clustering)
  • fastx-toolkit (PHred score filtering)
  • pear (paired end read merging)
  • pysam (sam file convertion to fastq)

Recommended Installation Procedure

It's recommended to use conda to install and manage the dependencies for this package

conda env create -f https://raw.githubusercontent.com/brocklab/pycashier/main/environment.yml # or mamba env create -f ....
conda activate cashierenv
pycashier --help

Additionally you may install with pip. Though it will be up to you to ensure all the non-python dependencies are on the path and installed correctly.

pip install pycashier

Usage

Pycashier has one required argument which is the directory containing the fastq or sam files you wish to process.

conda activate cashierenv
pycashier ./fastqs

For additional parameters see pycashier -h.

As the files are processed two additional directories will be created pipeline and outs.

Currently all intermediary files generated as a result of the program will be found in pipeline.

While the final processed files will be found within the outs directory.

Merging Files

Pycashier can now take paired end reads and perform a merging of the reads to produce a fastq which can then be used with cashier's default feature.

pycashier ./fastqs -m

Processing Barcodes from 10X bam files

Pycashier can also extract gRNA barcodes along with 10X cell and umi barcodes.

Firstly we are only interested in the unmapped reads. From the cellranger bam output you would obtain these reads using samtools.

samtools view -f 4 possorted_genome_bam.bam > unmapped.sam

Then similar to normal barcode extraction you can pass a directory of these unmapped sam files to pycashier and extract barcodes. You can also still specify extraction parameters that will be passed to cutadapt as usual.

Note: The default parameters passed to cutadapt are unlinked adapters and minimum barcode length of 10 bp.

pycashier ./unmapped_sams -sc

When finished the outs directory will have a .tsv containing the following columns: Illumina Read Info, UMI Barcode, Cell Barcode, gRNA Barcode

Usage notes

Pycashier will NOT overwrite intermediary files. If there is an issue in the process, please delete either the pipeline directory or the requisite intermediary files for the sample you wish to reprocess. This will allow the user to place new fastqs within the source directory or a project folder without reprocessing all samples each time.

  • Currently, pycashier expects to find .fastq.gz files when merging and .fastq files when extracting barcodes. This behavior may change in the future.
  • If there are reads from multiple lanes they should first be concatenated with cat sample*R1*.fastq.gz > sample.R1.fastq.gz
  • Naming conventions:
    • Sample names are extracted from files using the first string delimited with a period. Please take this into account when naming sam or fastq files.
    • Each processing step will append information to the input file name to indicate changes, again delimited with periods.
Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero.

Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero. Print the decimal value of each fraction on a new line with places after the decimal.

Shruti Dhave 2 Nov 29, 2021
CEI Natural Disaster Tracking Portal

CEI Natural Disaster Tracking Portal (cc) Climatic Eye of ISCI We are an initiative that conducts studies in the field of Space Science, publishes pro

Baris Dincer 7 Dec 24, 2022
This is a pretty basic but relatively nice looking Python Pomodoro Timer.

Python Pomodoro-Timer This is a pretty basic but relatively nice looking Pomodoro Timer. Currently its set to a very basic mode, but the funcationalit

EmmHarris 2 Oct 18, 2021
Python decorator for `TODO`s

Python decorator for `TODO`s. Don't let your TODOs rot in your python projects anymore !

Klemen Sever 74 Sep 13, 2022
Encode stuff with ducks!

Duckify Encoder Usage Download main.py and run it. main.py has an encoded version in encoded_main.py.txt. As A Module Download the duckify folder (or

Jeremiah 2 Nov 15, 2021
Create rangebased on lists or values of the range itself. Range any type. Can you imagine?

funcao-allrange-for-python3 Create rangebased on lists or values of the range itself. Range any type. Can you imagine? WARNING!!! THIS MODULE DID NOT

farioso-fernando 1 Feb 09, 2022
A Blender addon to align the origin to the top, center or bottom of a mesh object

Align Origin Blender Addon. Align Origin Blender Addon. What? This simple addon lets you align the origin to the top, center or bottom of a mesh objec

VA79 7 Nov 30, 2022
A small C compiler written in Python for learning purposes

A small C compiler written in Python. Generates x64 Intel-format assembly, which is then assembled and linked by nasm and ld.

Scattered Thoughts 3 Oct 22, 2021
ChieriBot,词云API版,用于统计群友说过的怪话

wordCloud_API 词云API版,用于统计群友说过的怪话,基于wordCloud 消息储存在mysql数据库中.数据表结构见table.sql 为啥要做成API:这玩意太吃性能了,如果和Bot放在同一个服务器,可能会影响到bot的正常运行 你服务器性能够用的话就当我在放屁 依赖包 pip i

chinosk 7 Mar 20, 2022
The newest contender in Server Gateway Interface.

nsgi The newest contender in Server Gateway Interface. Why use this webserver? This webserver is made with the newest version of asyncio, and sockets,

OpenRobot 1 Feb 12, 2022
Hack CMU Go Local Project

GoLocal A submission for the annual HackCMU Hackathon. We built a website which connects shopper with local businesses. The goal is to drive consumers

2 Oct 02, 2021
Personal Finance Forecaster - An AI tool for forecasting personal expenses

Personal Finance Forecaster - An AI tool for forecasting personal expenses

2 Mar 09, 2022
Sodium is a general purpose programming language which is instruction-oriented

Sodium is a general purpose programming language which is instruction-oriented (a new programming concept that we are developing and devising)

Satin Wuker 22 Jan 11, 2022
Check broken access control exists in the Java web application

javaEeAccessControlCheck Check broken access control exists in the Java web application. 检查 Java Web 应用程序中是否存在访问控制绕过问题。 使用 python3 javaEeAccessControl

kw0ng 3 May 04, 2022
Write a program that works out whether if a given year is a leap year

Leap Year 💪 This is a Difficult Challenge 💪 Instructions Write a program that works out whether if a given year is a leap year. A normal year has 36

Rodrigo Santos 0 Jun 22, 2022
Implent of Oracle Base line and Lea-3 Baseline

Oracle-Baseline Implent of Oracle Base line and Lea-3 Baseline Oracle Oracle : This model is used to obtain an oracle with a greedy algorithm similar

Andrew Zeng 2 Nov 12, 2021
Rufus port to linux, writed on Python3

Rufus-for-Linux Rufus port to linux, writed on Python3 Программа будет иметь тот же интерфейс что и оригинал, и тот же функционал. Программа создается

6 Jan 07, 2022
This is a simple leaderboard for 30 days of Google Cloud program for students of ASIET

30daysleaderboard #Hacktoberfest - Please don't make changes in readme file. Only improvement in the project will be accepted. Update - Now if you run

5 Oct 29, 2021
Organize seu linux - organize your linux

OrganizeLinux Organize seu linux - organize your linux Organize seu linux Uma forma rápida de separar arquivos dispersos em pastas. formatos a serem c

Marcus Vinícius Ribeiro Andrade 1 Nov 30, 2021
Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance.

Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance. We have developed the system such that, it will automatically pa

VIJETA CHAVHAN 3 Nov 08, 2022