Write reproducible code for getting and processing ChEMBL

Overview

chembl_downloader

PyPI PyPI - Python Version PyPI - License DOI Code style: black

Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download it and use it automatically.

Installation

$ pip install chembl-downloader

Usage

Download A Specific Version

import chembl_downloader

path = chembl_downloader.download(version='28')

After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored using pystow automatically in the ~/.data/chembl directory.

We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is a paid feature.

Download the Latest Version

First, you'll have to install bioversions with pip install bioversions, whose job it is to look up the latest version of many databases. Then, you can modify the previous code slightly by omitting the version keyword argument:

import chembl_downloader

path = chembl_downloader.download()

The version keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()), but will be omitted below for brevity.

Automate Connection

Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with the resulting file. Don't do this, it's not reproducible! Instead, the file can be downloaded and a connection can be opened automatically with:

import chembl_downloader

with chembl_downloader.connect() as conn:
    with conn.cursor() as cursor:
        cursor.execute(...)  # run your query string
        rows = cursor.fetchall()  # get your results

The cursor() function provides a convenient wrapper around this operation:

import chembl_downloader

with chembl_downloader.cursor() as cursor:
    cursor.execute(...)  # run your query string
    rows = cursor.fetchall()  # get your results

Run a query and get a pandas DataFrame

The most powerful function is query() which builds on the previous connect() function in combination with pandas.read_sql to make a query and load the results into a pandas DataFrame for any downstream use.

import chembl_downloader

sql = """
SELECT
    MOLECULE_DICTIONARY.chembl_id,
    MOLECULE_DICTIONARY.pref_name
FROM MOLECULE_DICTIONARY
JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
WHERE molecule_dictionary.pref_name IS NOT NULL
LIMIT 5
"""

df = chembl_downloader.query(sql)
df.to_csv(..., sep='\t', index=False)

Suggestion 1: use pystow to make a reproducible file path that's portable to other people's machines (e.g., it doesn't have your username in the path).

Suggestion 2: RDKit is now pip-installable with pip install rdkit-pypi, which means most users don't have to muck around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is the rdkit.Chem.PandasTools module.

Store in a Different Place

If you want to store the data elsewhere using pystow (e.g., in pyobo I also keep a copy of this file), you can use the prefix argument.

import chembl_downloader

# It gets downloaded/extracted to 
# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
path = chembl_downloader.download(prefix=['pyobo', 'raw', 'chembl'])

See the pystow documentation on configuring the storage location further.

The prefix keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()).

Download via CLI

After installing, run the following CLI command to ensure it and send the path to stdout

$ chembl_downloader

Use --test to show two example queries

$ chembl_downloader --test

Contributing

If you'd like to contribute, there's a submodule called chembl_downloader.queries where you can add an SQL query along with a description of what it does for easy importing.

Comments
  • Repo status

    Repo status

    Dear @cthoyt,

    I know that you have multiple responsibilities, but I was wondering if the current repo is in working condition or if is it a legacy repo which worked with a specific version of ChEMBL? It would be great if you could add a batch on the repo for the same.

    Thank You.

    opened by YojanaGadiya 4
  • Add SQL for getting activities by target

    Add SQL for getting activities by target

    This PR adds some functionality for generating target-based datasets, motivated by https://github.com/PatWalters/yamc/issues/14.

    See the notebook here (note that this is pinned with a permalink to the state after merging this PR).

    opened by cthoyt 1
  • Improve ChEBI mapping notebook

    Improve ChEBI mapping notebook

    This filters out about 10% of the possible ChEMBL - ChEBI curations since ChEBI externally already took care of that

    -> move this into biomappings repo

    opened by cthoyt 0
  • Call for additional functionality

    Call for additional functionality

    • What other operations do people commonly want to do with the entire ChEMBL database/SDF file that would be good to wrap (including loading other files released by ChEMBL)?
    • What other operations like the RDKit supplier exist in other libraries that might be worth wrapping?

    @iwatobipen do you have any suggestions?

    opened by cthoyt 0
  • Add functionality for bacting

    Add functionality for bacting

    @egonw are there any bulk SMILES, InChI, or SDF loading operations in bacting that are exposed by pybacting that would be nice to wrap inside this library for full loading of ChEMBL? On the readme, you can see I made a specific function for RDKit's "supplier" that reads an SDF file

    opened by cthoyt 3
Releases(v0.4.1)
  • v0.4.1(Nov 19, 2022)

    What's Changed

    • Add SQL for getting activities by target by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/8
    • Improve ChEBI mapping notebook by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/10
    • Add UniProt target mapping functions by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/11

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Oct 28, 2022)

    This PR does several things:

    1. Removes dependency on bioversions and just implements the code locally
    2. Adds a CLI for generating a statistics table for all versions of ChEMBL
    3. Add proper project skeleton (documentation, unit tests, code quality assurance, CI)
    4. Improve SQLite loading in case you delete the compressed data

    Notebooks

    1. Adds notebook about drug indications
    2. Adds notebook about mapping to ChEBI
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 19, 2022)

    This release adds two new functions:

    1. chembl_downloader.download_monomer_library which gets this file https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_30_monomer_library.xml for whatever version you specify
    2. chembl_downloader.get_monomer_library_root which does the same as the downloader but also parses the XML for you

    Thanks to @iwatobipen and his recent blog post for inspiring this.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 14, 2022)

    New Functions

    • chembl_downloader.download_fps downloads the pre-computed Morgan fingerprint file
    • chembl_downloader.download_chemreps downloads the chembl-smiles-inchi-inchikey map
    • chembl_downloader.get_chemreps_df builds on chembl_downloader.download_chemreps and loads them in a pandas dataframe

    Misc

    • Add isort to code quality checking
    • Enable many functions with return_version to make a tuple with the version, which is useful if you're having it infer the latest version.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Dec 20, 2021)

    This release adds the get_substructure_library() for automating the generation of an RDKit substructure library as described in Greg Landrum's RDKit blog post, Some new features in the SubstructLibrary. The following example shows how it can be used to accomplish some of the first tasks presented in the post:

    from rdkit import Chem
    
    import chembl_downloader
    
    library = chembl_downloader.get_substructure_library()
    query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
    matches = library.GetMatches(query)
    

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 20, 2021)

  • v0.1.1(Aug 5, 2021)

  • v0.1.0(Aug 4, 2021)

    • rename download() to download_extract_sqlite() to make room for other download functions
    • added supplier() function for loading the SDF dump through RDKit
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jul 28, 2021)

  • v0.0.3(Jul 27, 2021)

  • v0.0.2(Jul 27, 2021)

  • v0.0.1(Jul 27, 2021)

Owner
Charles Tapley Hoyt
Bio/cheminformatician, open scientist, maintainer of @pybel and @pykeen, part of @indralab (he/him)
Charles Tapley Hoyt
Gogoanime-dl - Gogoanime downloader for downloading anime.

gogoanime-dl With this script, you can download episodes of your favorite anime from Gogoanime. The current site that's developed against is https://w

1 Jan 06, 2022
Download minecraft head or skin, allows TLauncher accounts

Download minecraft head or skin, allows TLauncher accounts

1 Dec 30, 2021
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022
😷 Dowload dos documentos da CPI da Pandemia

A CPI da Pandemia recebeu milhares de documentos públicos, todos disponibilizados no site do Senado Federal.

Eduardo Cuducos 98 Sep 23, 2022
Tool To download - Amazon - Netflix- Disney+ - VideoLand - Boomerang - RTE.ie

vinetrimmer Widevine Decryption Script for Python Modules Amazon Netflix (with [email protected]

9 Dec 31, 2021
YouTube-Downloader - YouTube Video Downloader made using python

YouTube-Downloader YouTube Videos Downloder made using python.

Shivam 1 Jan 16, 2022
Automatically download and crop key information from the arxiv daily paper. (cpu version)

Automatically download and crop key information from the arxiv daily paper. (cpu version)

HeoLis 4 Jul 30, 2022
This is a simple Python Script to download Imgur Pictures with the short url!

Imgur Downloader This is a simple Python Script that runs a process with progress bar that downloads an Imgur Picture! Code Example Features Progress

OGMatrix 1 Nov 18, 2021
Python script designed to search and fetch direct download links from nxbrew.com

SwitchGamesDownloader Only for windows nxbrew.com is a website, accessible only using a proxy, where the majority of games for the Nintendo Switch are

Backend 91 Dec 28, 2022
Download and save Bing wallpapers and set as background for GNOME desktop

Save Bing wallpapers and set as background for GNOME desktop This script downloads the Bing wallpaper and sets it in the background of your gnome desk

manikamran 2 Nov 06, 2021
Source code of paper: "HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration".

HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration Environments The code mainly requires the following

Intelligent Sensing, Perception and Computing Group 3 Oct 06, 2022
Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/.

Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/. Features Scrapes 10 pages Scrapes each thread Downloads all the images

aether 1 Dec 29, 2021
This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces

CODEFORCES DOWNLOADER This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces Requirements

Mohak 6 Dec 29, 2022
GTK4 + Python tutorial with code examples

Taiko's GTK4 Python tutorial Wanna make apps for Linux but not sure how to start with GTK? This guide will hopefully help! The intent is to show you h

190 Jan 08, 2023
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

pytube 7.9k Jan 02, 2023
Python software to download videos from Tiktok without rights

download-video-tiktok Python software to download videos from Tiktok without rights to install pip install requests Follow us telegram : https://t.me

muntazir halim 1 Oct 28, 2021
ASF Sentinel-1 Metadata Download tool

ASF Sentinel-1 Metadata Download tool Copyright: 2021-2022 Antonio Valentino Small Python tool (asfsmd) that allows to download XML files containing S

Antonio Valentino 9 Dec 04, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
A web app for downloading Facebook comments as a csv file

Facebook Comment Downloader A small web app for downloading comments from a public facebook page post. Comment downloading from https://github.com/min

WSDOT 23 Jan 04, 2023