Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Overview

Clairvoyance: A Pipeline Toolkit for Medical Time Series


Authors: van der Schaar Lab

This repository contains implementations of Clairvoyance: A Pipeline Toolkit for Medical Time Series for the following applications.

  • Time-series prediction (one-shot and online)
  • Transfer learning
  • Individualized time-series treatment effects (ITE) estimation
  • Active sensing on time-series data
  • AutoML

All API files for those applications can be found in /api folder. All tutorials for those applications can be found in /tutorial folder.

Block diagram of Clairvoyance

Installation

There are currently two ways of installing the required dependencies: using Docker or using Conda.

Note on Requirements

  • Clairvoyance has been tested on Ubuntu 20.04, but should be broadly compatible with common Linux systems.
  • The Docker installation method is additionally compatible with Mac and Windows systems that support Docker.
  • Hardware requirements depends on the underlying ML models used, but a machine that can handle ML research tasks is recommended.
  • For faster computation, CUDA-capable Nvidia card is recommended (follow the CUDA-enabled installation steps below).

Docker installation

  1. Install Docker on your system: https://docs.docker.com/get-docker/.
  2. [Required for CUDA-enabled installation only] Install Nvidia container runtime: https://github.com/NVIDIA/nvidia-container-runtime/.
    • Assumes Nvidia drivers are correctly installed on your system.
  3. Get the latest Clairvoyance Docker image:
    $ docker pull clairvoyancedocker/clv:latest
  4. To run the Docker container as a terminal, execute the below from the Clairvoyance repository root:
    $ docker run -i -t --gpus all --network host -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data clairvoyancedocker/clv
    • Explanation of the docker run arguments:
      • -i -t: Run a terminal session.
      • --gpus all: [Required for CUDA-enabled installation only], passes your GPU(s) to the Docker container, otherwise skip this option.
      • --network host: Use your machine's network and forward ports. Could alternatively publish ports, e.g. -p 8888:8888.
      • -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data: Share directory/ies with the Docker container as volumes, e.g. data.
      • clairvoyancedocker/clv: Specifies Clairvoyance Docker image.
    • If using Windows:
      • Use PowerShell and first run the command $pwdwin = $(pwd).Path. Then use $pwdwin instead of $(pwd) in the docker run command.
    • If using Windows or Mac:
      • Due to how Docker networking works, replace --network host with -p 8888:8888.
  5. Run all following Clairvoyance API commands, jupyter notebooks etc. from within this Docker container.

Conda installation

Conda installation has been tested on Ubuntu 20.04 only.

  1. From the Clairvoyance repo root, execute:
    $ conda env create --name clvenv -f ./environment.yml
    $ conda activate clvenv
  2. Run all following Clairvoyance API commands, jupyter notebooks etc. in the clvenv environment.

Data

Clairvoyance expects your dataset files to be defined as follows:

  • Four CSV files (may be compressed), as illustrated below:
    static_test_data.csv
    static_train_data.csv
    temporal_test_data.csv
    temporal_train_data.csv
    
  • Static data file content format:
    id,my_feature,my_other_feature,my_third_feature_etc
    3wOSm2,11.00,4,-1.0
    82HJss,3.40,2,2.1
    iX3fiP,7.01,3,-0.4
    ...
    
  • Temporal data file content format:
    id,time,variable,value
    3wOSm2,0.0,my_first_temporal_feature,0.45
    3wOSm2,0.5,my_first_temporal_feature,0.47
    3wOSm2,1.2,my_first_temporal_feature,0.49
    3wOSm2,0.0,my_second_temporal_feature,10.0
    3wOSm2,0.1,my_second_temporal_feature,12.4
    3wOSm2,0.3,my_second_temporal_feature,9.3
    82HJss,0.0,my_first_temporal_feature,0.22
    82HJss,1.0,my_first_temporal_feature,0.44
    ...
    
  • The id column is required in the static data files. The id,time,variable,value columns are required in the temporal file. The IDs of samples must match between the static and temporal files.
  • Your data files are expected to be under:
    <clairvoyance_repo_root>/datasets/data/<your_dataset_name>/
    
  • See tutorials for how to define your dataset(s) in code.
  • Clairvoyance examples make reference to some existing datasets, e.g. mimic, ward. These are confidential datasets (or in case of MIMIC-III, it requires a training course and an access request) and are not provided here. Contact [email protected] for more details.

Extract data from MIMIC-III

To use MIMIC-III with Clairvoyance, you need to get access to MIMIC-III and follow the instructions for installing it in a Postgres database: https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/

$ cd datasets/mimic_data_extraction && python extract_antibiotics_dataset.py

Usage

  • To run tutorials:
    • Launch jupyter lab: $ jupyter-lab.
      • If using Windows or Mac and following the Docker installation method, run jupyter-lab --ip="0.0.0.0".
    • Open jupyter lab in the browser by following the URL with the token.
    • Navigate to tutorial/ and run a tutorial of your choice.
  • To run Clairvoyance API from the command line, execute the appropriate command from within the Docker terminal (see example command below).

Example: Time-series prediction

To run the pipeline for training and evaluation on time-series prediction framework, simply run $ python -m api/main_api_prediction.py or take a look at the jupyter notebook tutorial/tutorial_prediction.ipynb.

Note that any model architecture can be used as the predictor model such as RNN, Temporal convolutions, and transformer. The condition for predictor model is to have fit and predict functions as its subfunctions.

  • Stages of the time-series prediction:

    • Import dataset
    • Preprocess data
    • Define the problem (feature, label, etc.)
    • Impute missing components
    • Select the relevant features
    • Train time-series predictive model
    • Estimate the uncertainty of the predictions
    • Interpret the predictions
    • Evaluate the time-series prediction performance on the testing set
    • Visualize the outputs (performance, predictions, uncertainties, and interpretations)
  • Command inputs:

    • data_name: mimic, ward, cf
    • normalization: minmax, standard, None
    • one_hot_encoding: input features that need to be one-hot encoded
    • problem: one-shot or online
    • max_seq_len: maximum sequence length after padding
    • label_name: the column name for the label(s)
    • treatment: the column name for treatments
    • static_imputation_model: mean, median, mice, missforest, knn, gain
    • temporal_imputation_model: mean, median, linear, quadratic, cubic, spline, mrnn, tgain
    • feature_selection_model: greedy-addition, greedy-deletion, recursive-addition, recursive-deletion, None
    • feature_number: selected feature number
    • model_name: rnn, gru, lstm, attention, tcn, transformer
    • h_dim: hidden dimensions
    • n_layer: layer number
    • n_head: head number (only for transformer model)
    • batch_size: number of samples in mini-batch
    • epochs: number of epochs
    • learning_rate: learning rate
    • static_mode: how to utilize static features (concatenate or None)
    • time_mode: how to utilize time information (concatenate or None)
    • task: classification or regression
    • uncertainty_model_name: uncertainty estimation model name (ensemble)
    • interpretation_model_name: interpretation model name (tinvase)
    • metric_name: auc, apr, mae, mse
  • Example command:

    $ cd api
    $ python main_api_prediction.py \
        --data_name cf --normalization minmax --one_hot_encoding admission_type \
        --problem one-shot --max_seq_len 24 --label_name death \
        --static_imputation_model median --temporal_imputation_model median \
        --model_name lstm --h_dim 100 --n_layer 2 --n_head 2 --batch_size 400 \
        --epochs 20 --learning_rate 0.001 \
        --static_mode concatenate --time_mode concatenate \
        --task classification --uncertainty_model_name ensemble \
        --interpretation_model_name tinvase --metric_name auc
  • Outputs:

    • Model prediction
    • Model performance
    • Prediction uncertainty
    • Prediction interpretation

Citation

To cite Clairvoyance in your publications, please use the following reference.

Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, and Mihaela van der Schaar (2021). Clairvoyance: A Pipeline Toolkit for Medical Time Series. In International Conference on Learning Representations. Available at: https://openreview.net/forum?id=xnC8YwKUE3k.

You can also use the following Bibtex entry.

@inproceedings{
  jarrett2021clairvoyance,
  title={Clairvoyance: A Pipeline Toolkit for Medical Time Series},
  author={Daniel Jarrett and Jinsung Yoon and Ioana Bica and Zhaozhi Qian and Ari Ercole and Mihaela van der Schaar},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=xnC8YwKUE3k}
}

To cite the Clairvoyance alpha blog post, please use:

van Der Schaar, M., Yoon, J., Qian, Z., Jarrett, D., & Bica, I. (2020). clairvoyance alpha: the first pipeline toolkit for medical time series. [Webpages]. https://doi.org/10.17863/CAM.70020

@misc{https://doi.org/10.17863/cam.70020,
  doi = {10.17863/CAM.70020},
  url = {https://www.repository.cam.ac.uk/handle/1810/322563},
  author = {Van Der Schaar,  Mihaela and Yoon,  Jinsung and Qian,  Zhaozhi and Jarrett,  Dan and Bica,  Ioana},
  title = {clairvoyance alpha: the first pipeline toolkit for medical time series},
  publisher = {Apollo - University of Cambridge Repository},
  year = {2020}
}
Owner
van_der_Schaar \LAB
We are creating cutting-edge machine learning methods and applying them to drive a revolution in healthcare.
van_der_Schaar \LAB
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

60 Dec 29, 2022
LaBERT - A length-controllable and non-autoregressive image captioning model.

Length-Controllable Image Captioning (ECCV2020) This repo provides the implemetation of the paper Length-Controllable Image Captioning. Install conda

bearcatt 53 Nov 13, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

Facebook Research 81 Dec 29, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022
IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

IJON SPACE EXPLORER IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL. Using only a small (usually one line) annotati

Chair for Sys­tems Se­cu­ri­ty 146 Dec 16, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeepMind 270 Nov 30, 2022
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
A small tool to joint picture including gif

README 做设计的时候遇到拼接长图的情况,但是发现没有什么好用的能拼接gif的工具。 于是自己写了个gif拼接小工具。 可以自动拼接gif、png和jpg等常见格式。 效果 从上至下 从下至上 从左至右 从右至左 使用 克隆仓库 git clone https://github.com/Dels

3 Dec 15, 2021
Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

face3d: Python tools for processing 3D face Introduction This project implements some basic functions related to 3D faces. You can use this to process

Yao Feng 2.3k Dec 30, 2022
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

253 Dec 24, 2022
Dataset Condensation with Contrastive Signals

Dataset Condensation with Contrastive Signals This repository is the official implementation of Dataset Condensation with Contrastive Signals (DCC). T

3 May 19, 2022
Python package for missing-data imputation with deep learning

MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant

MIDASverse 77 Dec 03, 2022
3rd Place Solution for ICCV 2021 Workshop SSLAD Track 3A - Continual Learning Classification Challenge

Online Continual Learning via Multiple Deep Metric Learning and Uncertainty-guided Episodic Memory Replay 3rd Place Solution for ICCV 2021 Workshop SS

Rifki Kurniawan 6 Nov 10, 2022
AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

AutoML for Image Semantic Segmentation Currently this repo contains the only working open-source implementation of Auto-Deeplab which, by the way out-

AI Necromancer 299 Dec 17, 2022
Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

6 Sep 03, 2021
Deep Surface Reconstruction from Point Clouds with Visibility Information

Data, code and pretrained models for the paper Deep Surface Reconstruction from Point Clouds with Visibility Information.

Raphael Sulzer 23 Jan 04, 2023
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022