Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Last update: Jun 27, 2022

Related tags

Overview

B4PPI

Benchmarking Pipeline for the Prediction of Protein-Protein Interactions

How this benchmarking pipeline has been built, and how to use it, is detailed in our preprint here (please cite it if you find this work useful!).

A minimal example is available here, and the list of requirements there.

How to use the gold standard

All the data files are in data, most of them are available as csv (sep='|') and pickled pandas DataFrames (sometimes the csv file may be missing due to file size constraints on GitHub).

The gold standard, without pre-processed features, can be loaded using:

goldStandard = pd.read_csv(
    os.path.join('data', 'benchmarkingGS_v1-0.csv'),
    sep='|'
)

Or with the pre-processed features:

goldStandard_with_featuresSeq = pd.read_pickle(
    os.path.join('data', 'benchmarkingGS_v1-0_similarityMeasure_sequence_v3-1.pkl')
)

UniProtIDs are used for both proteins A and B.
isInteraction is the ground truth from the IntAct database (1 = interacting proteins, 0 = non-interacting proteins).
trainTest is the split between training set (train), first testing set T1 (test1) and second testing set T2 (test2).
Pre-processed features are explained in the manuscript.

Training and evaluation can then be done normally. The code from the preprint is in the Training section.

How to cite this work

Lannelongue L., Inouye M., Construction of in silico protein-protein interaction networks across different topologies using machine learning, 2022, BioArxiv

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

Credits

The code was written in Python 3.7.
Many libraries were used, in particular Pandas, Numpy, scikit-learn and PyTorch Lightning (full list in the code and in the requirements file).
Plots were drawn using Matplotlib, Seaborn and the MetBrewer colour palettes.
Logs were saved using Weight & Bias.

Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Related tags

Overview

B4PPI

How to use the gold standard

How to cite this work

Licence

Credits

Owner

Loïc Lannelongue

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Repository for self-supervised landmark discovery

TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments for IV 2022.

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Visual dialog agents with pre-trained vision-and-language encoders.

Awesome-google-colab - Google Colaboratory Notebooks and Repositories

Implementation of SwinTransformerV2 in TensorFlow.

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

FLSim a flexible, standalone library written in PyTorch that simulates FL settings with a minimal, easy-to-use API

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

BankNote-Net: Open dataset and encoder model for assistive currency recognition

Read number plates with https://platerecognizer.com/

A transformer which can randomly augment VOC format dataset (both image and bbox) online.

Position detection system of mobile robot in the warehouse enviroment

An essential implementation of BYOL in PyTorch + PyTorch Lightning

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging