EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Overview

EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Paper on arXiv

EquiBind, is a SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand’s bound pose and orientation. EquiBind achieves significant speed-ups and better quality compared to traditional and recent baselines. If you have questions, don't hesitate to open an issue or ask me via [email protected] or social media or Octavian Ganea via [email protected]. We are happy to hear from you!

Dataset

Our preprocessed data (see dataset section in the paper Appendix) is available from zenodo.
The files in data contain the names for the time-based data split.

If you want to train one of our models with the data then:

  1. download it from zenodo
  2. unzip the directory and place it into data such that you have the path data/PDBBind

Use provided model weights to predict binding structure of your own protein-ligand pairs:

Step 1: What you need as input

Ligand files of the formats .mol2 or .sdf or .pdbqt or .pdb.
Receptor files of the format .pdb
For each complex you want to predict you need a directory containing the ligand and receptor file. Like this:

my_data_folder
└───name1
    │   name1_protein.pdb
    │   name1_ligand.sdf
└───name2
    │   name2_protein.pdb
    │   name2_ligand.sdf
...

Step 2: Setup Environment

We will set up the environment using Anaconda. Clone the current repo

git clone https://github.com/HannesStark/EquiBind

Create a new environment with all required packages using environment.yml (this can take a while). While in the project directory run:

conda env create

Activate the environment

conda activate equibind

Here are the requirements themselves if you want to install them manually instead of using the environment.yml:

python=3.7
pytorch 1.10
torchvision
cudatoolkit=10.2
torchaudio
dgl-cuda10.2
rdkit
openbabel
biopython
rdkit
biopandas
pot
dgllife
joblib
pyaml
icecream
matplotlib
tensorboard

Step 3: Predict Binding Structures!

In the config file configs_clean/inference.yml set the path to your input data folder inference_path: path_to/my_data_folder.
Then run:

python inference.py --config=configs_clean/inference.yml

Done! 🎉
Your results are saved as .sdf files in the directory specified in the config file under output_directory: 'data/results/output' and as tensors at runs/flexible_self_docking/predictions_RDKitFalse.pt!

Reproducing paper numbers

Download the data and place it as described in the "Dataset" section above.

Using the provided model weights

To predict binding structures using the provided model weights run:

python inference.py --config=configs_clean/inference_file_for_reproduce.yml

This will give you the results of EquiBind-U and then those of EquiBind after running the fast ligand point cloud fitting corrections.
The numbers are a bit better than what is reported in the paper. We will put the improved numbers into the next update of the paper.

Training a model yourself and using those weights

To train the model yourself, run:

python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml

The model weights are saved in the runs directory.
You can also start a tensorboard server tensorboard --logdir=runs and watch the model train.
To evaluate the model on the test set, change the run_dirs: entry of the config file inference_file_for_reproduce.yml to point to the directory produced in runs. Then you can runpython inference.py --config=configs_clean/inference_file_for_reproduce.yml as above!

Reference

📃 Paper on arXiv

@misc{stark2022equibind,
      title={EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction}, 
      author={Hannes Stärk and Octavian-Eugen Ganea and Lagnajit Pattanaik and Regina Barzilay and Tommi Jaakkola},
      year={2022}
}
Owner
Hannes Stärk
MIT Research Intern • Geometric DL + Graphs :heart: • M. Sc. Informatics from TU Munich
Hannes Stärk
Python port of R's Comprehensive Dynamic Time Warp algorithm package

Welcome to the dtw-python package Comprehensive implementation of Dynamic Time Warping algorithms. DTW is a family of algorithms which compute the loc

Dynamic Time Warping algorithms 154 Dec 26, 2022
ROS support for Velodyne 3D LIDARs

Overview Velodyne1 is a collection of ROS2 packages supporting Velodyne high definition 3D LIDARs3. Warning: The master branch normally contains code

ROS device drivers 543 Dec 30, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

76 Feb 04, 2021
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
Jiminy Cricket Environment (NeurIPS 2021)

Jiminy Cricket This is the repository for "What Would Jiminy Cricket Do? Towards Agents That Behave Morally" by Dan Hendrycks*, Mantas Mazeika*, Andy

Dan Hendrycks 15 Aug 29, 2022
[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

DRAGON: From Generalized zero-shot learning to long-tail with class descriptors Paper Project Website Video Overview DRAGON learns to correct the bias

Dvir Samuel 25 Dec 06, 2022
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

Yang Song 757 Jan 04, 2023
This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch.

MPDL---TODO This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch. Ci

CodebaseLi 3 Nov 27, 2022
A Simple and Versatile Framework for Object Detection and Instance Recognition

SimpleDet - A Simple and Versatile Framework for Object Detection and Instance Recognition Major Features FP16 training for memory saving and up to 2.

TuSimple 3k Dec 12, 2022
Towards Debiasing NLU Models from Unknown Biases

Towards Debiasing NLU Models from Unknown Biases Abstract: NLU models often exploit biased features to achieve high dataset-specific performance witho

Ubiquitous Knowledge Processing Lab 22 Jun 14, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

Wouter Van Gansbeke 80 Nov 20, 2022
EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Dec 29, 2022
Automatic Data-Regularized Actor-Critic (Auto-DrAC)

Auto-DrAC: Automatic Data-Regularized Actor-Critic This is a PyTorch implementation of the methods proposed in Automatic Data Augmentation for General

89 Dec 13, 2022
Implementation of average- and worst-case robust flatness measures for adversarial training.

Relating Adversarially Robust Generalization to Flat Minima This repository contains code corresponding to the MLSys'21 paper: D. Stutz, M. Hein, B. S

David Stutz 13 Nov 27, 2022
[SIGGRAPH Asia 2021] Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN [Paper] [Project Website] [Output resutls] Official Pytorch i

Badour AlBahar 215 Dec 17, 2022
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio [email protected]

Davide Carnemolla 17 Jun 20, 2022
A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

Tarin Ziyaee 92 Sep 28, 2021
Neural Message Passing for Computer Vision

Neural Message Passing for Quantum Chemistry Implementation of different models of Neural Networks on graphs as explained in the article proposed by G

Pau Riba 310 Nov 07, 2022