A set of tools to analyse the output from TraDIS analyses

Last update: Feb 16, 2022

Related tags

Data Analysis QuaTradis

Overview

QuaTradis (Quadram TraDis)

A set of tools to analyse the output from TraDIS analyses

Introduction
Installation
Usage
License
Feedback/Issues
Citation

Introduction

The QuaTradis pipeline provides software utilities for the processing, mapping, and analysis of transposon insertion sequencing data. The pipeline was designed with the data from the TraDIS sequencing protocol in mind, but should work with a variety of transposon insertion sequencing protocols as long as they produce data in the expected format.

For more information on the TraDIS method, see http://bioinformatics.oxfordjournals.org/content/32/7/1109 and http://genome.cshlp.org/content/19/12/2308.

Installation

QuaTradis has the following dependencies:

Required dependencies

bwa
smalt
samtools
tabix

There are a number of ways to install QuaTradis and details are provided below. If you encounter an issue when installing QuaTradis please contact your local system administrator.

Bioconda

Install conda and enable the bioconda channel.

conda install -c bioconda quatradis=xxx

Docker

QuaTradis can be run in a Docker container. First install Docker, then pull the QuaTradis image from dockerhub:

docker pull quadraminstitute/quatradis

To use QuaTradis use a command like this (substituting in your directories), where your files are assumed to be stored in /home/ubuntu/data:

docker run --rm -it -v /home/ubuntu/data:/data quadraminstitute/quatradis bacteria_tradis -h

Running the tests

The test can be run with pytest from the tests directory. Alternatively you can use the make target from the top-level directory:

make test

Usage

QuaTradis provides functionality to:

detect TraDIS tags in a BAM file
add the tags to the reads
filter reads in a FastQ file containing a user defined tag
remove tags
map to a reference genome
create an insertion site plot file

The functions are available as standalone scripts or as perl modules.

Scripts

Executable scripts to carry out most of the listed functions are available in the bin:

check_tradis_tags - Prints 1 if tags are present in alignment file, prints 0 if not.
add_tradis_tags - Generates a BAM file with tags added to read strings.
filter_tradis_tags - Create a fastq file containing reads that match the supplied tag
remove_tradis_tags - Creates a fastq file containing reads with the supplied tag removed from the sequences
tradis_plot - Creates an gzipped insertion site plot
bacteria_tradis - Runs complete analysis, starting with a fastq file and produces mapped BAM files and plot files for each file in the given file list and a statistical summary of all files. Note that the -f option expects a text file containing a list of fastq files, one per line. This script can be run with or without supplying tags.

Note that default parameters are for comparative experiments, and will need to be modified for gene essentiality studies.

A help menu for each script can be accessed by running the script by adding with "--help".

Analysis Scripts

Three scripts are provided to perform basic analysis of TraDIS results in bin:

tradis_gene_insert_sites - Takes genome annotation in embl format along with plot files produced by bacteria_tradis and generates tab-delimited files containing gene-wise annotations of insert sites and read counts.
tradis_essentiality.R - Takes a single tab-delimited file from tradis_gene_insert_sites to produce calls of gene essentiality. Also produces a number of diagnostic plots.
tradis_comparison.R - Takes tab files to compare two growth conditions using edgeR. This analysis requires experimental replicates.

License

QuaTradis is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page or email [email protected]

Citation

If you use this software please cite:

"The TraDIS toolkit: sequencing and analysis for dense transposon mutant libraries", Barquist L, Mayho M, Cummins C, Cain AK, Boinett CJ, Page AJ, Langridge G, Quail MA, Keane JA, Parkhill J. Bioinformatics. 2016 Apr 1;32(7):1109-11. doi: 10.1093/bioinformatics/btw022. Epub 2016 Jan 21.

Comments

fix channel order in readme

Channel order is important for bioconda to work correctly -- the conda-forge has to come first (which means higher priority when specified on the command line with -c). That might be why some users are getting pysam issues requiring a workaround.

FYI might also want to consider suggesting --strict-channel-priority, see the new bioconda docs.

opened by daler 1
Fixes for albatradis compatibility

Fixing name of analysis output files for consumption by albatradis.

Fixing mistake when creating gene names during insertion site analysis.. Shouldn't have ignored underscores in the name.

opened by maplesond 0
requirements.txt should not list bgzip

A followup to the discussion on the Bioconda PR: The requirements.txt file that you are using should not list bgzip. Names in requirements.txt refer to packages on PyPI, so if you list bgzip, you actually pull in a Python package named bgzip (that is meant to be used via import bgzip from within Python). It will not give you the bgzip binary that your project actually seems to want.

You cannot list non-Python dependencies in requirements.txt so you can only list that dependency in the Conda recipe.

opened by marcelm 0
Fixing problems running the job in docker.

The issue was that the mapping stage outputs files to the current working directory which may not have user permissions. The fix is to make sure mapping logs are output to the same place as all other output files.

opened by maplesond 0
Nextflow pipeline to replace bacteria_tradis, and implementation of tradis_gene_insert_sites

Adding nextflow to handle processing of multiple fastq files (similar to bacteria_tradis).

Add the tradis_gene_insert_sites script, and associated functions under isp_analyse. Although there are still some very small diffs between this and old biotradis script in terms of ins_index and ins_count, which I still need to investigate.

Renamed and refactored a few things.

Added a few scripts to get closer to feature parity with old BioTradis.

Tidied up README.

opened by maplesond 0
problem with running tradis pipeline multiple

Hello,

When I try to run following command using quatradis:

tradis pipeline multiple -v -n 12 -o quatradis_out fastqs_filtered_sizecut_all.txt genome.fa

this error appears: Traceback (most recent call last): File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 293, in main() File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 285, in main args.func(args) File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 202, in run_multiple_pipeline tradis.run_multi_tradis(args.fastqs, args.reference, File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/site-packages/quatradis/tradis.py", line 142, in run_multi_tradis pipeline = find_pipeline_file() File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/site-packages/quatradis/tradis.py", line 101, in find_pipeline_file if os.path.exists(exe_path): File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/genericpath.py", line 19, in exists os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

What I'm doing wrong?

The same input files work smoothly in bacteria_tradis.

Bests, Jan

opened by gaworj 1

Releases(0.8.3)

0.8.3(Jun 21, 2022)

Source code(tar.gz)
Source code(zip)
0.8.2(May 10, 2022)

Source code(tar.gz)
Source code(zip)
0.8.1(May 10, 2022)

Source code(tar.gz)
Source code(zip)
0.8.0(May 10, 2022)

Source code(tar.gz)
Source code(zip)
0.7.0(Apr 2, 2022)

Source code(tar.gz)
Source code(zip)
0.6.2(Mar 6, 2022)

Source code(tar.gz)
Source code(zip)
0.6.1(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
0.6.0(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
0.5.4(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.5.3(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.5.2(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.5.1(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.5.0(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.4.10(Mar 4, 2022)

Source code(tar.gz)
Source code(zip)
0.4.9(Mar 2, 2022)

Source code(tar.gz)
Source code(zip)
0.4.8(Mar 2, 2022)

Source code(tar.gz)
Source code(zip)
0.4.7(Mar 2, 2022)

Source code(tar.gz)
Source code(zip)
0.4.6(Mar 2, 2022)

Source code(tar.gz)
Source code(zip)
0.4.5(Feb 16, 2022)

Source code(tar.gz)
Source code(zip)
0.4.4(Feb 16, 2022)

Source code(tar.gz)
Source code(zip)
0.4.3(Feb 16, 2022)

Source code(tar.gz)
Source code(zip)
0.4.2(Feb 14, 2022)

Source code(tar.gz)
Source code(zip)
0.4.1(Feb 14, 2022)

Source code(tar.gz)
Source code(zip)
0.4.0(Feb 14, 2022)

Source code(tar.gz)
Source code(zip)
0.3.4(Feb 10, 2022)

Source code(tar.gz)
Source code(zip)
0.3.3(Feb 10, 2022)

null
Source code(tar.gz)
Source code(zip)

Owner

Quadram Institute Bioscience

GitHub Repository

Using approximate bayesian posteriors in deep nets for active learning

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

687 Dec 25, 2022

MoRecon - A tool for reconstructing missing frames in motion capture data.

38 Dec 03, 2022

Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022

ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

0 Jul 07, 2021

A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

1 Jan 02, 2022

Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

2 Oct 01, 2021

This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be

0 Jan 20, 2022

Modular analysis tools for neurophysiology data

Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin

5 Dec 22, 2021

Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021

A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

2 May 15, 2022

Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

3 Nov 04, 2022

Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

2 Jan 20, 2022

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

6 Sep 01, 2022

Top 50 best selling books on amazon

It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years

1 Nov 18, 2021

Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-

293 Dec 29, 2022

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 09, 2023

International Space Station data with Python research 🌎

International Space Station data with Python research 🌎 Plotting ISS trajectory, calculating the velocity over the earth and more. Plotting trajector

41 Jun 16, 2022

Py-price-monitoring - A Python price monitor

A Python price monitor This project was focused on Brazil, so the monitoring is

1 Jan 04, 2022

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

0 Dec 04, 2021

A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

2 Dec 22, 2021

A set of tools to analyse the output from TraDIS analyses

Related tags

Overview

QuaTradis (Quadram TraDis)

Contents

Introduction

Installation

Required dependencies

Bioconda

Docker

Running the tests

Usage

Scripts

Analysis Scripts

License

Feedback/Issues

Citation

Comments

fix channel order in readme

Fixes for albatradis compatibility

requirements.txt should not list bgzip

Fixing problems running the job in docker.

Nextflow pipeline to replace bacteria_tradis, and implementation of tradis_gene_insert_sites

problem with running tradis pipeline multiple

Releases(0.8.3)

0.8.3(Jun 21, 2022)

0.8.2(May 10, 2022)

0.8.1(May 10, 2022)

0.8.0(May 10, 2022)

0.7.0(Apr 2, 2022)

0.6.2(Mar 6, 2022)

0.6.1(Mar 5, 2022)

0.6.0(Mar 5, 2022)

0.5.4(Mar 4, 2022)

0.5.3(Mar 4, 2022)

0.5.2(Mar 4, 2022)

0.5.1(Mar 4, 2022)

0.5.0(Mar 4, 2022)

0.4.10(Mar 4, 2022)

0.4.9(Mar 2, 2022)

0.4.8(Mar 2, 2022)

0.4.7(Mar 2, 2022)

0.4.6(Mar 2, 2022)

0.4.5(Feb 16, 2022)

0.4.4(Feb 16, 2022)

0.4.3(Feb 16, 2022)

0.4.2(Feb 14, 2022)

0.4.1(Feb 14, 2022)

0.4.0(Feb 14, 2022)

0.3.4(Feb 10, 2022)

0.3.3(Feb 10, 2022)

Owner

Quadram Institute Bioscience

Using approximate bayesian posteriors in deep nets for active learning

MoRecon - A tool for reconstructing missing frames in motion capture data.

Flood modeling by 2D shallow water equation

ETL pipeline on movie data using Python and postgreSQL

A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

Exploratory Data Analysis for Employee Retention Dataset

This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Modular analysis tools for neurophysiology data

Full automated data pipeline using docker images

A pipeline that creates consensus sequences from a Nanopore reads. I

Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Full ELT process on GCP environment.

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Top 50 best selling books on amazon

Lale is a Python library for semi-automated data science.

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

International Space Station data with Python research 🌎

Py-price-monitoring - A Python price monitor

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

A model checker for verifying properties in epistemic models