B4PPI

Benchmarking Pipeline for the Prediction of Protein-Protein Interactions

How this benchmarking pipeline has been built, and how to use it, is detailed in our preprint here (please cite it if you find this work useful!).

A minimal example is available here, and the list of requirements there.

How to use the gold standard

All the data files are in data, most of them are available as csv (sep='|') and pickled pandas DataFrames (sometimes the csv file may be missing due to file size constraints on GitHub).

The gold standard, without pre-processed features, can be loaded using:

goldStandard = pd.read_csv(
    os.path.join('data', 'benchmarkingGS_v1-0.csv'),
    sep='|'
)

Or with the pre-processed features:

goldStandard_with_featuresSeq = pd.read_pickle(
    os.path.join('data', 'benchmarkingGS_v1-0_similarityMeasure_sequence_v3-1.pkl')
)

UniProtIDs are used for both proteins A and B.
isInteraction is the ground truth from the IntAct database (1 = interacting proteins, 0 = non-interacting proteins).
trainTest is the split between training set (train), first testing set T1 (test1) and second testing set T2 (test2).
Pre-processed features are explained in the manuscript.

Training and evaluation can then be done normally. The code from the preprint is in the Training section.

How to cite this work

Lannelongue L., Inouye M., Construction of in silico protein-protein interaction networks across different topologies using machine learning, 2022, BioArxiv

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

Credits

The code was written in Python 3.7.
Many libraries were used, in particular Pandas, Numpy, scikit-learn and PyTorch Lightning (full list in the code and in the requirements file).
Plots were drawn using Matplotlib, Seaborn and the MetBrewer colour palettes.
Logs were saved using Weight & Bias.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1. Preprocessing		1. Preprocessing
2. Gold standard		2. Gold standard
3. Training		3. Training
4. Results		4. Results
data		data
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
logVersions.yaml		logVersions.yaml
minimal_example.ipynb		minimal_example.ipynb
minimal_example.py		minimal_example.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Preprocessing

1. Preprocessing

2. Gold standard

2. Gold standard

3. Training

3. Training

4. Results

4. Results

data

data

.gitignore

.gitignore

README.md

README.md

config.yaml

config.yaml

logVersions.yaml

logVersions.yaml

minimal_example.ipynb

minimal_example.ipynb

minimal_example.py

minimal_example.py

requirements.txt

requirements.txt

Repository files navigation

B4PPI

How to use the gold standard

How to cite this work

Licence

Credits

About

Releases

Packages

Languages

Llannelongue/B4PPI

Folders and files

Latest commit

History

Repository files navigation

B4PPI

How to use the gold standard

How to cite this work

Licence

Credits

About

Resources

Stars

Watchers

Forks

Languages