Toolchest provides APIs for scientific and bioinformatic data analysis.

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

  • you need a key
  • you’ve forgotten your key
  • the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments
  • Enable paired reads for `kraken2`

    Enable paired reads for `kraken2`

    Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

    Adds/removes --paired to tool_args as necessary.

    opened by bcai2 3
  • v0.4.0

    v0.4.0

    • Add Poetry, remove Twine

    • Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

    Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.

    opened by lebovic 3
  • S3 chaining

    S3 chaining

    Adds:

    • Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables
    • S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool
    • the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
    opened by lebovic 2
  • Polish tool_arg handling, add more STAR args

    Polish tool_arg handling, add more STAR args

    Adds:

    • More STAR args
    • Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)
    • Error on unknown or blacklisted args
    • Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

    Requires:

    • https://github.com/trytoolchest/toolchest-worker-node/pull/24
    • https://github.com/trytoolchest/toolchest-api/pull/22

    This does not fix:

    • Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
    opened by lebovic 2
  • STAR whitelist options

    STAR whitelist options

    • Adds basic whitelist options for STAR.

    • Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

    (This should be merged in after the kraken2 paired read commit.)

    opened by bcai2 2
  • feat: centrifuge base

    feat: centrifuge base

    • Adds the centrifuge tool.
    • Adds docs.
    • Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
    opened by bcai2 1
  • fix: upload/download tracker bugfixes

    fix: upload/download tracker bugfixes

    • Refactors the tracking printed statements into a pythonic print call with string formatting.
    • Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)
    • Adds integration test dirs to .gitignore.
    opened by bcai2 1
  • fix: remove pysam due to multiple issues

    fix: remove pysam due to multiple issues

    Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

    opened by jherr-dev 1
  • feat: add preliminary alphafold support

    feat: add preliminary alphafold support

    Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

    use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

    opened by jherr-dev 1
  • feat: support async execution

    feat: support async execution

    Adds:

    • Support for async execution

    See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.

    opened by lebovic 1
  • fix: set default version number

    fix: set default version number

    Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

    Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

    opened by bcai2 1
Releases(v0.11.3)
Owner
Toolchest
Toolchest
📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock

Paulina Khew 21 Aug 30, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Toolchest 11 Jun 30, 2022
Feature Detection Based Template Matching

Feature Detection Based Template Matching The classification of the photos was made using the OpenCv template Matching method. Installation Use the pa

Muhammet Erem 2 Nov 18, 2021
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
Clean and reusable data-sciency notebooks.

KPACUBO KPACUBO is a set Jupyter notebooks focused on the best practices in both software development and data science, namely, code reuse, explicit d

Matvey Morozov 1 Jan 28, 2022
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
The repo for mlbtradetrees.com. Analyze any trade in baseball history!

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022
DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 04, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

861 Dec 29, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022