Systemic Evolutionary Chemical Space Exploration for Drug Discovery

Last update: Dec 16, 2022

Overview

SECSE

SECSE: Systemic Evolutionary Chemical Space Explorer

Chemical space exploration is a major task of the hit-finding process during the pursuit of novel chemical entities. Compared with other screening technologies, computational de novo design has become a popular approach to overcome the limitation of current chemical libraries. Here, we reported a de novo design platform named systemic evolutionary chemical space explorer (SECSE). The platform was conceptually inspired by fragment-based drug design, that miniaturized a “lego-building” process within the pocket of a certain target. The key of virtual hits generation was then turned into a computational search problem. To enhance search and optimization, human intelligence and deep learning were integrated. SECSE has the potential in finding novel and diverse small molecules that are attractive starting points for further validation.

Tutorials and Usage

Set Environment Variables
export $SECSE=path/to/SECSE
if you use AutoDock Vina for docking: (download here)
export $VINA=path/to/AutoDockVINA
if you use Gilde for docking (additional installation & license required):
export $SCHRODINGER=path/to/SCHRODINGER
Give execution permissions to the SECSE directory
chmod -R +X path/to/SECSE
Input fragments: a tab split .smi file without header. See demo here.
Parameters in config file:
[DEFAULT]
- workdir, working directory, create if not exists, otherwise overwrite, type=str
- fragments, file path to seed fragments, smi format, type=str
- num_gen, number of generations, type=int
- num_per_gen, number of molecules generated each generation, type=int
- seed_per_gen, number of selected seed molecules per generation, default=1000, type=int
- start_gen, number of staring generation, default=0, type=int
- docking_program, name of docking program, AutoDock-Vina (input vina) or Glide (input glide) , default=vina, type=str
[docking]
- target, protein PDBQT if use AutoDock Vina; Grid file if choose Glide, type=str
- RMSD, docking pose RMSD cutoff between children and parent, default=2, type=float
- delta_score, decreased docking score cutoff between children and parent, default=-1.0, type=float
- score_cutoff, default=-9, type=float
Parameters when docking by AutoDock Vina:
- x, Docking box x, type=float
- y, Docking box y, type=float
- z, Docking box z, type=float
- box_size_x, Docking box size x, default=20, type=float
- box_size_y, Docking box size y, default=20, type=float
- box_size_z, Docking box size z, default=20, type=float
[deep learning]
- mode, mode of deep learning modeling, 0: not use, 1: modeling per generation, 2: modeling overall after all the generation, default=0, type=int
- dl_per_gen, top N predicted molecules for docking, default=100, type=int
- dl_score_cutoff, default=-9, type=float
[properties]
- MW, molecular weights cutoff, default=450, type=int
- logP_lower, minimum of logP, default=0.5, type=float
- logP_upper, maximum of logP, default=7, type=float
- chiral_center, maximum of chiral center,default=3, type=int
- heteroatom_ratio, maximum of heteroatom ratio, default=0.35, type=float
- rotatable_bound_num, maximum of rotatable bound, default=5, type=int
- rigid_body_num, default=2, type=int
Config file of a demo case phgdh_demo_vina.ini
Run SECSE
python $SECSE/run_secse.py --config path/to/config
Output files
- merged_docked_best_timestamp_with_grow_path.csv: selected molecules and growing path
- selected.sdf: 3D conformers of all selected molecules

Dependencies

GNU Parallel installation

CentOS / RHEL
yum install parallel
Ubuntu / Debian
sudo apt install parallel
From source: https://www.gnu.org/software/parallel/

numpy~=1.20.3, pandas~=1.3.3, pandarallel~=1.5.2, tqdm~=4.62.2, biopandas~=0.2.9, openbabel~=3.1.1, rdkit~=2021.03.5, chemprop~=1.3.1, torch~=1.9.0+cu111

Citation

Lu, C.; Liu, S.; Shi, W.; Yu, J.; Zhou, Z.; Zhang, X.; Lu, X.; Cai, F.; Xia, N.; Wang, Y. Systemic Evolutionary Chemical Space Exploration For Drug Discovery. ChemRxiv 2021. This content is a preprint and has not been peer-reviewed.

License

SECSE is released under Apache License, Version 2.0.

ETMO: Evolutionary Transfer Multiobjective Optimization

ETMO: Evolutionary Transfer Multiobjective Optimization To promote the research on ETMO, benchmark problems are of great importance to ETMO algorithm

0 Mar 16, 2021

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

4 Dec 14, 2021

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

0 Jan 16, 2022

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

58 Dec 21, 2022

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

50 Dec 18, 2022

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

7 Nov 14, 2022

The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

7 Aug 2, 2022

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

48 Nov 21, 2022

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

56 Jan 1, 2023

Comments

Problem running demo

Hi!

When I try to run the demo with the command below. python $SECSE/run_secse.py --config demo/phgdh_demo_vina.ini

It generates pandas.errors.EmptyDataError: No columns to parse from file, what should I do to solve it? Thank you!

Here is the output

**************************************************************************************** 
      ____    _____    ____   ____    _____ 
     / ___|  | ____|  / ___| / ___|  | ____|
     \___ \  |  _|   | |     \___ \  |  _|  
      ___) | | |___  | |___   ___) | | |___ 
     |____/  |_____|  \____| |____/  |_____|
/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/core/generic.py:2882: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
 method=method,
Table 'G-001' already exists.

******************************************************************
Input fragment file: /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi
Target grid file: /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt
Workdir: /home/bruce/Work/CADD/SECSE/code/res/


************************************************** 
Generation  0 ...
Step 1: Docking with Autodock Vina ...
/home/bruce/Work/CADD/SECSE/code/secse/evaluate/ligprep_vina_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_0 /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
find /home/bruce/Work/CADD/SECSE/code/res/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/bruce/Work/CADD/SECSE/code/res/generation_0/docking_outputs_with_score.sdf
Docking time cost: 0.12 min.
Step 2: Ranking docked molecules...
9 cmpds after evaluate
The evaluate score cutoff is: -9.0
9 final seeds.

************************************************** 
Generation  1 ...
Step 1: Mutation
No rule class:  B-001
No rule class:  G-003
No rule class:  G-004
No rule class:  G-005
No rule class:  G-006
No rule class:  G-007
No rule class:  M-001
No rule class:  M-002
No rule class:  M-003
No rule class:  M-004
No rule class:  M-005
No rule class:  M-006
No rule class:  M-007
No rule class:  M-008
No rule class:  M-009
No rule class:  M-010
No rule class: G-002
Step 2: Filtering all mutated mols
sh /home/bruce/Work/CADD/SECSE/code/secse/growing/filter_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_1 1 demo/phgdh_demo_vina.ini 10
Filter runtime: 0.00 min.
Traceback (most recent call last):
 File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 80, in <module>
   main()
 File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 65, in main
   workflow.grow()
 File "/home/bruce/Work/CADD/SECSE/code/secse/grow_processes.py", line 208, in grow
   self._filter_df = pd.read_csv(os.path.join(self.workdir_now, "filter.csv"), header=None)
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
   return func(*args, **kwargs)
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
   return _read(filepath_or_buffer, kwds)
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
   parser = TextFileReader(filepath_or_buffer, **kwds)
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
   self._engine = self._make_engine(self.engine)
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
   return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
 File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
   self._reader = parsers.TextReader(self.handles.handle, **kwds)
 File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

opened by BW15061999 17

Question about running the demo code

Hi authors,

I have tried to run your demo code in README.md, but got some errors.

Command

python /home/xxx/workspace/off-SECSE/secse/run_secse.py --config ./config.ini

Output

 **************************************************************************************** 
       ____    _____    ____   ____    _____ 
      / ___|  | ____|  / ___| / ___|  | ____|
      \___ \  |  _|   | |     \___ \  |  _|  
       ___) | | |___  | |___   ___) | | |___ 
      |____/  |_____|  \____| |____/  |_____|

******************************************************************
Input fragment file: /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi
Target grid file: /home/xxx/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt
Workdir: /home/xxx/workspace/off-SECSE/fy-run/demo001/

Step 1: Docking with Autodock Vina ...
/home/xxx/workspace/off-SECSE/secse/evaluate/ligprep_vina_parallel.sh /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0 /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi /home/t-yafan/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
find /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/docking_outputs_with_score.sdf
Docking time cost: 0.11 min.
Step 2: Ranking docked molecules...
9 cmpds after evaluate
The evaluate score cutoff is: -9.0
9 final seeds.

 ************************************************** 
Generation  1 ...
Step 1: Mutation
Traceback (most recent call last):
  File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 70, in <module>
    main()
  File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 55, in main
    workflow.grow()
  File "/home/xxx/workspace/off-SECSE/secse/grow_processes.py", line 159, in grow
    header = mutation_df(self.winner_df, self.workdir, self.cpu_num, self.gen)
  File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 166, in mutation_df
    mutation = Mutation(5000, workdir)
  File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 29, in __init__
    self.load_common_rules()
  File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 50, in load_common_rules
    c.execute(sql)
sqlite3.OperationalError: no such table: B-001

It seems that the file secse/growing/mutation/rules_demo.db is missing in the repo. How can I fix it?

Thanks!

opened by fyabc 5

All dockings do not work because there's no gridding process.

Hi, I was trying out the repo when I realised that neither the autodock nor glide is able to run because there was no gridding process, resulting in no grid files. >.<

opened by yipy0005 3

Releases(v1.1.0)

v1.1.0(Jul 15, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(Feb 24, 2022)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Introduction This repository includes the source code for "Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks", which is pu

11 Nov 27, 2022

[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

52 Nov 14, 2022

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

33 Dec 07, 2022

LibFewShot: A Comprehensive Library for Few-shot Learning.

LibFewShot Make few-shot learning easy. Supported Methods Meta MAML(ICML'17) ANIL(ICLR'20) R2D2(ICLR'19) Versa(NeurIPS'18) LEO(ICLR'19) MTL(CVPR'19) M

[email protected]&L"> 603 Jan 05, 2023

Buffon’s needle: one of the oldest problems in geometric probability

Buffon-s-Needle Buffon’s needle is one of the oldest problems in geometric proba

3 Feb 18, 2022

[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

22 Sep 28, 2022

《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

45 Nov 21, 2022

Bringing Computer Vision and Flutter together , to build an awesome app !!

Bringing Computer Vision and Flutter together , to build an awesome app !! Explore the Directories Flutter · Machine Learning Table of Contents About

14 Apr 07, 2022

A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

28 Dec 17, 2021

TabNet for fastai

TabNet for fastai This is an adaptation of TabNet (Attention-based network for tabular data) for fastai (=2.0) library. The original paper https://ar

116 Oct 21, 2022

Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

14 Aug 30, 2022

thundernet ncnn

MMDetection_Lite 基于mmdetection 实现一些轻量级检测模型，安装方式和mmdeteciton相同 voc0712 voc 0712训练 voc2007测试 coco预训练 thundernet_voc_shufflenetv2_1.5 input shape mAP 320

39 Dec 05, 2022

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

pytorch-AdaIN This is an unofficial pytorch implementation of a paper, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Hua

873 Jan 06, 2023

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset This repository contains links to data and code to fetch and reproduce

19 Dec 16, 2022

Python script that takes an Impulse response .wav and a input .wav to demonstrate audio convolution.

convolver Python script that takes an Impulse response .wav and a input .wav to demonstrate audio convolution. Created by Sean Higley

1 Feb 23, 2022

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021)

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi Code

18 Dec 27, 2022

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

99 Dec 06, 2022

We present a regularized self-labeling approach to improve the generalization and robustness properties of fine-tuning.

Overview This repository provides the implementation for the paper "Improved Regularization and Robustness for Fine-tuning in Neural Networks", which

21 Sep 08, 2022

Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts

Face mask detection Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts in order to detect face masks in static im

1 Oct 27, 2021

免费获取http代理并生成proxifier配置文件

freeproxy 免费获取http代理并生成proxifier配置文件公众号：台下言书工具说明：https://mp.weixin.qq.com/s?__biz=MzIyNDkwNjQ5Ng==&mid=2247484425&idx=1&sn=56ccbe130822aa35038095317

32 Mar 25, 2022