peptides.py is a pure-Python package to compute common descriptors for protein sequences

Last update: Dec 31, 2022

Related tags

Overview

`peptides.py`

Physicochemical properties and indices for amino-acid sequences.

🗺️ Overview

peptides.py is a pure-Python package to compute common descriptors for protein sequences. It is a port of Peptides, the R package written by Daniel Osorio for the same purpose. This library has no external dependency and is available for all modern Python versions (3.6+).

🔧 Installing

Install the peptides package directly from PyPi which hosts universal wheels that can be installed with pip:

$ pip install peptides

💡 Example

Start by creating a Peptide object from a protein sequence:

>>> import peptides
>>> peptide = peptides.Peptide("MLKKRFLGALAVATLLTLSFGTPVMAQSGSAVFTNEGVTPFAISYPGGGT")

Then use the appropriate methods to compute the descriptors you want:

>>> peptide.aliphatic_index()
89.8...
>>> peptide.boman()
-0.2097...
>>> peptide.charge(pH=7.4)
1.99199...
>>> peptide.isoelectric_point()
10.2436...

Methods that return more than one scalar value (for instance, Peptide.blosum_indices) will return a dedicated named tuple:

>>> peptide.ms_whim_scores()
MSWHIMScores(mswhim1=-0.436399..., mswhim2=0.4916..., mswhim3=-0.49200...)

Use the Peptide.descriptors method to get a dictionary with every available descriptor. This makes it very easy to create a pandas.DataFrame with descriptors for several protein sequences:

>> df = pandas.DataFrame([ peptides.Peptide(s).descriptors() for s in seqs ]) >>> df BLOSUM1 BLOSUM2 BLOSUM3 BLOSUM4 ... Z2 Z3 Z4 Z5 0 0.367000 -0.436000 -0.239 0.014500 ... -0.711000 -0.104500 -1.486500 0.429500 1 -0.697500 -0.372500 -0.493 0.157000 ... -0.307500 -0.627500 -0.450500 0.362000 2 0.479333 -0.001333 0.138 0.228667 ... -0.299333 0.465333 -0.976667 0.023333 [3 rows x 66 columns] ">

>>> seqs = ["SDKEVDEVDAALSDLEITLE", "ARQQNLFINFCLILIFLLLI", "EGVNDNECEGFFSAR"]
>>> df = pandas.DataFrame([ peptides.Peptide(s).descriptors() for s in seqs ])
>>> df
    BLOSUM1   BLOSUM2  BLOSUM3   BLOSUM4  ...        Z2        Z3        Z4        Z5
0  0.367000 -0.436000   -0.239  0.014500  ... -0.711000 -0.104500 -1.486500  0.429500
1 -0.697500 -0.372500   -0.493  0.157000  ... -0.307500 -0.627500 -0.450500  0.362000
2  0.479333 -0.001333    0.138  0.228667  ... -0.299333  0.465333 -0.976667  0.023333

[3 rows x 66 columns]

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the GNU General Public License v3.0. The original R Peptides package was written by Daniel Osorio, Paola Rondón-Villarreal and Rodrigo Torres, and is licensed under the terms of the GPLv2.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Peptides authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

You might also like...

Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

4 Feb 11, 2022

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

35 Jan 4, 2023

Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

1 Oct 20, 2021

sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

37 Dec 27, 2022

PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

5 Sep 23, 2022

Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

7 Sep 30, 2022

A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

4 Oct 17, 2022

VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

24 Dec 14, 2022

nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

23 Dec 8, 2022

Comments

Per-residue data

It seems that the API can only output single statistics for the entire peptide chain, but I'm interested in statistics for each residue individually. I'm wondering if it might be possible to output an array/list from some of these functions instead of always averaging them as is done now.
enhancement

opened by multimeric 1

Hydrophobic moment is inconsistent with R version

Computed hydrophobic moment is not the same as the one computed by R. More specifically, it seems that peptides.py always outputs 0 for the hydrophobic moment when peptide length is shorter than the set window. The returned value matches the value from R when peptide length is equal to or greater than the set window length.

Example in python:

>>> import peptides`
>>> peptides.Peptide("MLK").hydrophobic_moment(window=5, angle=100)
0.0
>>> peptides.Peptide("AACQ").hydrophobic_moment(window=5, angle=100)
0.0
>>> peptides.Peptide("FGGIQ").hydrophobic_moment(window=5, angle=100)
0.31847187610377536

Example in R:

> library(Peptides)
> hmoment(seq="MLK", window=5, angle=100)
[1] 0.8099386
> hmoment(seq="AACQ", window=5, angle=100)
[1] 0.3152961
> hmoment(seq="FGGIQ", window=5, angle=100)
[1] 0.3184719

I think that it can be easily fixed by internally setting the window length to the length of the peptide if the latter is shorter. What I propose:

--- a/peptides/__init__.py
+++ b/peptides/__init__.py
@@ -657,6 +657,7 @@ class Peptide(typing.Sequence[str]):
               :doi:`10.1073/pnas.81.1.140`. :pmid:`6582470`.

         """
+        window = min(window, len(self))
         scale = tables.HYDROPHOBICITY["Eisenberg"]
         lut = [scale.get(aa, 0.0) for aa in self._CODE1]
         angles = [(angle * i) % 360 for i in range(window)]

bug

opened by eotovic 1

RuntimeWarning in auto_correlation function()
Hi, thank you for creating peptides.py.

Some hydrophobicity tables together with certain proteins cause a runtime warning for in the function auto_correlation():

import peptides for hydro in peptides.tables.HYDROPHOBICITY.keys(): print(hydro) table = peptides.tables.HYDROPHOBICITY[hydro] peptides.Peptide('MANTQNISIWWWAR').auto_correlation(table)

Warning (s2 == 0):

RuntimeWarning: invalid value encountered in double_scalars return s1 / s2

The tables concerned are: octanolScale_pH2, interfaceScale_pH2, oiScale_pH2 Some other proteins causing the same warning: ['MSYGGSCAGFGGGFALLIVLFILLIIIGCSCWGGGGYGY', 'MFILLIIIGASCFGGGGGCGYGGYGGYAGGYGGYCC', 'MSFGGSCAGFGGGFALLIVLFILLIIIGCSCWGGGGGF']
opened by jhahnfeld 0

Releases(v0.3.1)

v0.3.1(Sep 1, 2022)
Fixed

peptides.datasets data files missing from the source distribution.

Source code(tar.gz)
Source code(zip)
v0.3.0(Sep 1, 2022)
Added

Peptide.linker_preference_profile to build a profile like used in the DomCut method from Suyama & Ohara (2002).

Peptide.profile to build a generic per-residue profile from a data table (#3).

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 25, 2021)
Added

Peptide.counts method to get the number of occurences of each amino acid in the peptide.

Peptide.frequencies to get the frequencies of each amino acid in the peptide.

Peptide.pcp_descriptors to compute the PCP descriptors from Mathura & Braun (2001).

Peptide.sneath_vectors to compute the descriptors from Sneath (1966).

Hydrophilicity descriptors from Barley (2018).

Peptide.structural_class to predict the structural class of a protein using one of three reference datasets and one of four distance metrics.

Changed

Peptide.aliphatic_index now supports unknown Leu/Ile residue (code J).

Swap order of Peptide.hydrophobic_moment arguments for consistency with profile methods.

Some Peptide functions now support vectorized code using numpy if available.

Source code(tar.gz)
Source code(zip)
v0.1.0(Oct 21, 2021)

Initial release.
Source code(tar.gz)
Source code(zip)

Owner

Martin Larralde

PhD candidate in Bioinformatics, passionate about programming, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Repository

Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

20 Sep 28, 2022

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

1 Dec 03, 2021

VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

24 Dec 14, 2022

First steps with Python in Life Sciences

First steps with Python in Life Sciences This course material is part of the "First Steps with Python in Life Science" three-day course of SIB-trainin

22 Jan 08, 2023

This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

1 Oct 02, 2022

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

3 Jan 31, 2022

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

1 Nov 17, 2021

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022

This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

2 Nov 23, 2021

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis rese

42 Dec 16, 2022

Python for Data Analysis, 2nd Edition

Python for Data Analysis, 2nd Edition Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media Buy

18.6k Jan 08, 2023

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

📈 Statistical Quality Control 📉 This repo contains a simple but effective tool made using python which can be used for quality control in statistica

8 Oct 18, 2022

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

2019-indian-election-eda Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle. This project is a part of the Cou

5 Oct 10, 2022

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

8 Oct 18, 2022

LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022

Single-Cell Analysis in Python. Scales to >1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

1.4k Jan 05, 2023

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

102 Nov 10, 2022

peptides.py is a pure-Python package to compute common descriptors for protein sequences

Related tags

Overview

peptides.py

🗺️ Overview

🔧 Installing

💡 Example

💭 Feedback

⚠️ Issue Tracker

🏗️ Contributing

⚖️ License

You might also like...

Python Package for DataHerb: create, search, and load datasets.

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python package for processing UC module spectral data.

sportsdataverse python package

PyEmits, a python package for easy manipulation in time-series data.

Retail-Sim is python package to easily create synthetic dataset of retaile store.

A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

VevestaX is an open source Python package for ML Engineers and Data Scientists.

nrgpy is the Python package for processing NRG Data Files

Comments

Per-residue data

Hydrophobic moment is inconsistent with R version

RuntimeWarning in auto_correlation function()

Releases(v0.3.1)

v0.3.1(Sep 1, 2022)

Fixed

v0.3.0(Sep 1, 2022)

Added

v0.2.0(Oct 25, 2021)

Added

Changed

v0.1.0(Oct 21, 2021)

Owner

Martin Larralde

Deep universal probabilistic programming with Python and PyTorch

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

VevestaX is an open source Python package for ML Engineers and Data Scientists.

First steps with Python in Life Sciences

This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

Project under the certification "Data Analysis with Python" on FreeCodeCamp

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

This is a python script to navigate and extract the FSD50K dataset

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

Python for Data Analysis, 2nd Edition

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

LynxKite: a complete graph data science platform for very large graphs and other datasets.

Single-Cell Analysis in Python. Scales to >1M cells.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

`peptides.py`