peptides.py is a pure-Python package to compute common descriptors for protein sequences

Overview

peptides.py Stars

Physicochemical properties and indices for amino-acid sequences.

Actions Coverage PyPI Wheel Python Versions Python Implementations License Source Mirror GitHub issues Changelog Downloads

🗺️ Overview

peptides.py is a pure-Python package to compute common descriptors for protein sequences. It is a port of Peptides, the R package written by Daniel Osorio for the same purpose. This library has no external dependency and is available for all modern Python versions (3.6+).

🔧 Installing

Install the peptides package directly from PyPi which hosts universal wheels that can be installed with pip:

$ pip install peptides

💡 Example

Start by creating a Peptide object from a protein sequence:

>>> import peptides
>>> peptide = peptides.Peptide("MLKKRFLGALAVATLLTLSFGTPVMAQSGSAVFTNEGVTPFAISYPGGGT")

Then use the appropriate methods to compute the descriptors you want:

>>> peptide.aliphatic_index()
89.8...
>>> peptide.boman()
-0.2097...
>>> peptide.charge(pH=7.4)
1.99199...
>>> peptide.isoelectric_point()
10.2436...

Methods that return more than one scalar value (for instance, Peptide.blosum_indices) will return a dedicated named tuple:

>>> peptide.ms_whim_scores()
MSWHIMScores(mswhim1=-0.436399..., mswhim2=0.4916..., mswhim3=-0.49200...)

Use the Peptide.descriptors method to get a dictionary with every available descriptor. This makes it very easy to create a pandas.DataFrame with descriptors for several protein sequences:

>> df = pandas.DataFrame([ peptides.Peptide(s).descriptors() for s in seqs ]) >>> df BLOSUM1 BLOSUM2 BLOSUM3 BLOSUM4 ... Z2 Z3 Z4 Z5 0 0.367000 -0.436000 -0.239 0.014500 ... -0.711000 -0.104500 -1.486500 0.429500 1 -0.697500 -0.372500 -0.493 0.157000 ... -0.307500 -0.627500 -0.450500 0.362000 2 0.479333 -0.001333 0.138 0.228667 ... -0.299333 0.465333 -0.976667 0.023333 [3 rows x 66 columns] ">
>>> seqs = ["SDKEVDEVDAALSDLEITLE", "ARQQNLFINFCLILIFLLLI", "EGVNDNECEGFFSAR"]
>>> df = pandas.DataFrame([ peptides.Peptide(s).descriptors() for s in seqs ])
>>> df
    BLOSUM1   BLOSUM2  BLOSUM3   BLOSUM4  ...        Z2        Z3        Z4        Z5
0  0.367000 -0.436000   -0.239  0.014500  ... -0.711000 -0.104500 -1.486500  0.429500
1 -0.697500 -0.372500   -0.493  0.157000  ... -0.307500 -0.627500 -0.450500  0.362000
2  0.479333 -0.001333    0.138  0.228667  ... -0.299333  0.465333 -0.976667  0.023333

[3 rows x 66 columns]

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the GNU General Public License v3.0. The original R Peptides package was written by Daniel Osorio, Paola Rondón-Villarreal and Rodrigo Torres, and is licensed under the terms of the GPLv2.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Peptides authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

You might also like...
Python Package for DataHerb: create, search, and load datasets.
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

sportsdataverse python package
sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

PyEmits, a python package for easy manipulation in time-series data.
PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

VevestaX is an open source Python package for ML Engineers and Data Scientists.
VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

Comments
  • Per-residue data

    Per-residue data

    It seems that the API can only output single statistics for the entire peptide chain, but I'm interested in statistics for each residue individually. I'm wondering if it might be possible to output an array/list from some of these functions instead of always averaging them as is done now.

    enhancement 
    opened by multimeric 1
  • Hydrophobic moment is inconsistent with R version

    Hydrophobic moment is inconsistent with R version

    Computed hydrophobic moment is not the same as the one computed by R. More specifically, it seems that peptides.py always outputs 0 for the hydrophobic moment when peptide length is shorter than the set window. The returned value matches the value from R when peptide length is equal to or greater than the set window length.

    Example in python:

    >>> import peptides`
    >>> peptides.Peptide("MLK").hydrophobic_moment(window=5, angle=100)
    0.0
    >>> peptides.Peptide("AACQ").hydrophobic_moment(window=5, angle=100)
    0.0
    >>> peptides.Peptide("FGGIQ").hydrophobic_moment(window=5, angle=100)
    0.31847187610377536
    

    Example in R:

    > library(Peptides)
    > hmoment(seq="MLK", window=5, angle=100)
    [1] 0.8099386
    > hmoment(seq="AACQ", window=5, angle=100)
    [1] 0.3152961
    > hmoment(seq="FGGIQ", window=5, angle=100)
    [1] 0.3184719
    

    I think that it can be easily fixed by internally setting the window length to the length of the peptide if the latter is shorter. What I propose:

    --- a/peptides/__init__.py
    +++ b/peptides/__init__.py
    @@ -657,6 +657,7 @@ class Peptide(typing.Sequence[str]):
                   :doi:`10.1073/pnas.81.1.140`. :pmid:`6582470`.
    
             """
    +        window = min(window, len(self))
             scale = tables.HYDROPHOBICITY["Eisenberg"]
             lut = [scale.get(aa, 0.0) for aa in self._CODE1]
             angles = [(angle * i) % 360 for i in range(window)]
    
    bug 
    opened by eotovic 1
  • RuntimeWarning in auto_correlation function()

    RuntimeWarning in auto_correlation function()

    Hi, thank you for creating peptides.py.

    Some hydrophobicity tables together with certain proteins cause a runtime warning for in the function auto_correlation():

    import peptides
    
    for hydro in peptides.tables.HYDROPHOBICITY.keys():
        print(hydro)
        table = peptides.tables.HYDROPHOBICITY[hydro]
        peptides.Peptide('MANTQNISIWWWAR').auto_correlation(table)
    

    Warning (s2 == 0):

    RuntimeWarning: invalid value encountered in double_scalars
      return s1 / s2
    

    The tables concerned are: octanolScale_pH2, interfaceScale_pH2, oiScale_pH2 Some other proteins causing the same warning: ['MSYGGSCAGFGGGFALLIVLFILLIIIGCSCWGGGGYGY', 'MFILLIIIGASCFGGGGGCGYGGYGGYAGGYGGYCC', 'MSFGGSCAGFGGGFALLIVLFILLIIIGCSCWGGGGGF']

    opened by jhahnfeld 0
Releases(v0.3.1)
  • v0.3.1(Sep 1, 2022)

  • v0.3.0(Sep 1, 2022)

    Added

    • Peptide.linker_preference_profile to build a profile like used in the DomCut method from Suyama & Ohara (2002).
    • Peptide.profile to build a generic per-residue profile from a data table (#3).
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Oct 25, 2021)

    Added

    • Peptide.counts method to get the number of occurences of each amino acid in the peptide.
    • Peptide.frequencies to get the frequencies of each amino acid in the peptide.
    • Peptide.pcp_descriptors to compute the PCP descriptors from Mathura & Braun (2001).
    • Peptide.sneath_vectors to compute the descriptors from Sneath (1966).
    • Hydrophilicity descriptors from Barley (2018).
    • Peptide.structural_class to predict the structural class of a protein using one of three reference datasets and one of four distance metrics.

    Changed

    • Peptide.aliphatic_index now supports unknown Leu/Ile residue (code J).
    • Swap order of Peptide.hydrophobic_moment arguments for consistency with profile methods.
    • Some Peptide functions now support vectorized code using numpy if available.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Oct 21, 2021)

Owner
Martin Larralde
PhD candidate in Bioinformatics, passionate about programming, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Martin Larralde
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

Spectacular AI 94 Jan 04, 2023
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

John Kurkowski 1.6k Jan 03, 2023
Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Covid County Executive summary Setup Install miniconda, then in the command line, run conda create -n covid-county conda activate covid-county conda i

Ahmed Fasih 1 Dec 22, 2021
The micro-framework to create dataframes from functions.

The micro-framework to create dataframes from functions.

Stitch Fix Technology 762 Jan 07, 2023
Convert monolithic Jupyter notebooks into Ploomber pipelines.

Soorgeon Join our community | Newsletter | Contact us | Blog | Website | YouTube Convert monolithic Jupyter notebooks into Ploomber pipelines. soorgeo

Ploomber 65 Dec 16, 2022
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 05, 2022
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022
Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt Labs 6.3k Jan 08, 2023
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 01, 2021
SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

SNV Pipeline SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

East Genomics 1 Nov 02, 2021
Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

Meta Research 906 Jan 01, 2023
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

qgrid Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your D

Quantopian, Inc. 2.9k Jan 08, 2023
Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

Linked data Modeling Language 7 Aug 07, 2022
Python Practicum - prepare for your Data Science interview or get a refresher.

Python-Practicum Python Practicum - prepare for your Data Science interview or get a refresher. Data Data visualization using data on births from the

Jovan Trajceski 1 Jul 27, 2021
Statistical Rethinking course winter 2022

Statistical Rethinking (2022 Edition) Instructor: Richard McElreath Lectures: Uploaded Playlist and pre-recorded, two per week Discussion: Online, F

Richard McElreath 3.9k Dec 31, 2022