The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments

Overview

GeneDisco ICLR-22 Challenge Starter Repository

Python version Library version

The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments.

GeneDisco (to be published at ICLR-22) is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.

Install

pip install -r requirements.txt

Use

Setup

  • Create a cache directory. This will hold any preprocessed and downloaded datasets for faster future invocation.
    • $ mkdir /path/to/genedisco_cache
    • Replace the above with your desired cache directory location.
  • Create an output directory. This will hold all program outputs and results.
    • $ mkdir /path/to/genedisco_output
    • Replace the above with your desired output directory location.

How to Run the Full Benchmark Suite?

Experiments (all baselines, acquisition functions, input and target datasets, multiple seeds) included in GeneDisco can be executed sequentially for e.g. acquired batch size 64, 8 cycles and a bayesian_mlp model using:

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --max_num_jobs=1

Results are written to the folder at /path/to/genedisco_cache, and processed datasets will be cached at /path/to/genedisco_cache (please replace both with your desired paths) for faster startup in future invocations.

Note that due to the number of experiments being run by the above command, we recommend execution on a compute cluster.
The GeneDisco codebase also supports execution on slurm compute clusters (the slurm command must be available on the executing node) using the following command and using dependencies in a Python virtualenv available at /path/to/your/virtualenv (please replace with your own virtualenv path):

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --schedule_on_slurm \
  --schedule_children_on_slurm \
  --remote_execution_virtualenv_path=/path/to/your/virtualenv

Other scheduling systems are currently not supported by default.

How to Run A Single Isolated Experiment (One Learning Cycle)?

To run one active learning loop cycle, for example, with the "topuncertain" acquisition function, the "achilles" feature set and the "schmidt_2021_ifng" task, execute the following command:

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="topuncertain" \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

How to Evaluate a Custom Acquisition Function?

To run a custom acquisition function, set --acquisition_function_name="custom" and --acquisition_function_path to the file path that contains your custom acquisition function (e.g. main.py in this repo).

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="custom" \
    --acquisition_function_path=/path/to/src/main.py \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

...where "/path/to/custom_acquisition_function.py" contains code for your custom acquisition function corresponding to the BaseBatchAcquisitionFunction interface, e.g.:

import numpy as np
from typing import AnyStr, List
from slingpy import AbstractDataSource
from slingpy.models.abstract_base_model import AbstractBaseModel
from genedisco.active_learning_methods.acquisition_functions.base_acquisition_function import \
    BaseBatchAcquisitionFunction

class RandomBatchAcquisitionFunction(BaseBatchAcquisitionFunction):
    def __call__(self,
                 dataset_x: AbstractDataSource,
                 batch_size: int,
                 available_indices: List[AnyStr], 
                 last_selected_indices: List[AnyStr] = None, 
                 model: AbstractBaseModel = None,
                 temperature: float = 0.9,
                 ) -> List:
        selected = np.random.choice(available_indices, size=batch_size, replace=False)
        return selected

Note that the last class implementing BaseBatchAcquisitionFunction is loaded by GeneDisco if there are multiple valid acquisition functions present in the loaded file.

Submission instructions

For submission, you will need two things:

Please note that all your submitted code must either be loaded via a dependency in requirements.txt or be present in the src/ directory in this starter repository for the submission to succeed.

Once you have set up your submission environment, you will need to create a lightweight container image that contains your acquisition function.

Submission steps

  • Navigate to the directory to which you have cloned this repo to.
    • $ cd /path/to/genedisco-starter
  • Ensure you have ONE acquisition function (inheriting from BaseBatchAcquisitionFunction) in main.py
    • This is your pre-defined program entry point.
  • Build your container image
    • $ docker build -t submission:latest .
  • Save your image name to a shell variable
    • $ IMAGE="submission:latest"
  • Use the EvalAI-CLI command to submit your image
    • Run the following command to submit your container image:
      • $ evalai push $IMAGE --phase gsk-genedisco-challenge-1528
      • Please note that you have a maximum number of submissions that any submission will be counted against.

That’s it! Our pipeline will take your image and test your function.

If you have any questions or concerns, please reach out to us at [email protected]

Citation

Please consider citing, if you reference or use our methodology, code or results in your work:

@inproceedings{mehrjou2022genedisco,
    title={{GeneDisco: A Benchmark for Experimental Design in Drug Discovery}},
    author={Mehrjou, Arash and Soleymani, Ashkan and Jesson, Andrew and Notin, Pascal and Gal, Yarin and Bauer, Stefan and Schwab, Patrick},
    booktitle={{International Conference on Learning Representations (ICLR)}},
    year={2022}
}

License

License

Authors

Arash Mehrjou, GlaxoSmithKline plc
Jacob A. Sackett-Sanders, GlaxoSmithKline plc
Patrick Schwab, GlaxoSmithKline plc

Acknowledgements

PS, JSS and AM are employees and shareholders of GlaxoSmithKline plc.

A template repository implementing HTML5 Boilerplate 8.0 in Sanic using the Domonic framework.

sanic-domonic-h5bp A template repository implementing HTML5 Boilerplate 8.0 in Sanic using the Domonic framework. If you need frontend interactivity,

PyXY 3 Dec 12, 2022
Open-source full-stack seed project that uses a React UI powered by a simple Flask API Server

React Flask Authentication Open-source full-stack seed project that uses a React UI powered by a simple Flask API Server.

App Generator 37 Dec 24, 2022
A command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, VueJS projects.

Cookiecutter A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python

18.7k Jan 08, 2023
A Project Template With Python

File Structure . ├── LICENSE ├── Makefile # commands ├── README.md ├──

Annotation AI 61 Jan 02, 2023
Django Webpack starter template for using Webpack 5 with Django 3.1 & Bootstrap 4. Yes, it can hot-reload.

Django Webpack Starter Hello fellow human. The repo uses Python 3.9.* Django 3.1.* Webpack 5.4.* Bootstrap 4.5.* Pipenv If you have any questions twe

Ganesh Khade 56 Nov 28, 2022
Bleeding edge django template focused on code quality and security.

wemake-django-template Bleeding edge django2.2 template focused on code quality and security. Purpose This project is used to scaffold a django projec

wemake.services 1.6k Jan 08, 2023
Code Kata Python Template

Code Kata Python Template This is the code kata template for python created by Aula de Software Libre de la Universidad de Córdoba Step 1. Use this re

Sergio Gómez 2 Nov 30, 2021
A Django starter template with a sound foundation.

SOS Django Template SOS Django Tempalate is a Django starter template that has opinionated and good solutions while starting your new Django project.

Eray Erdin 19 Oct 30, 2022
A platform for developers 👩‍💻 who wants to share their programs and projects.

Hacktoberfest-2021 A platform for developers 👩‍💻 who wants to share their projects and programs. Hacktoberfest has updated their rules and now this

Mayank Choudhary 40 Nov 07, 2022
This is a FastAPI, React, MongoDB stack Boilerplate. It's as glorious as a highland coo.

Coo - F.A.R.M stack BoilerPlate F.A.R.M - FastAPI, React, MongoDB This boilerplate utilizes FastAPI to build a REST API, MongoDB for data storage, and

Peter Waters 2 Feb 06, 2022
Generic python project template

generic-python-project-template generic-python-project-template STEPS - STEP 01- Create a repository by using template repository STEP 02- Clone the n

SUNNY BHAVEEN CHANDRA 3 Oct 03, 2022
Backend Boilerplate using Django,celery,Redis

Backend Boilerplate using Django,celery,Redis

Daniel Mawioo 2 Sep 14, 2022
Generic template for python service

Cookie cutter template example Technology stack Flask Gevent UWSGI Poetry Docker Docker-compose Installation pip install cookiecutter cookiecutter git

Churkin Oleg 11 Oct 22, 2022
Um template para quem quiser usar o Docker + PGSQL + Django.

Um template para quem quiser usar o Docker + PGSQL + Django.

Drack 2 Mar 11, 2022
The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments

GeneDisco ICLR-22 Challenge Starter Repository The starter repository for submissions to the GeneDisco challenge for optimized experimental design in

22 Dec 06, 2022
Creating Templates and components so those can be reusable some time and makes workflow a lot easier!

TEMPLATES AND COMPONENTS IN ANY LANG! This is an Open Repository For Students to Contribute code in Hackoctoberfest in different Languages and Tech me

SriSravyaN 9 Feb 19, 2022
Boilerplate code for a Python Flask API

MrMat :: Python :: API :: Flask Boilerplate code for a Python Flask API This variant of a Python Flask API is code-first and using native Flask Featur

0 Dec 26, 2021
The Django Base Site is a Django site that is built using the best Django practices and comes with all the common Django packages that you need to jumpstart your next project.

Django Base Site The Django Base Site is a Django site that is built using the best Django practices and comes with all the common Django packages tha

Brent O'Connor 167 Jan 03, 2023
A test Django application with production-level docker setup

DockerApp A test Django application with production-level docker setup. Blog: https://medium.com/@siddharth.sahu/the-near-perfect-dockerfile-for-djang

Siddharth Sahu 44 Nov 18, 2022
Basic Docker Compose template application with Flask, Celery, Redis, MySQL, SocketIO, Nginx and Gunicorn.

Nginx / Gunicorn / Flask 🐍 / Celery / SocketIO / MySQL / Redis / Docker 🐳 sample application Basic Docker Compose template application for orchestat

Alex Oarga 8 Aug 06, 2022