This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

Last update: Jan 20, 2022

Overview

HealthGen: Conditional EHR Time Series Generation

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness.

Installation

Clone the repo with: git clone --recurse-submodules [email protected]:simonbing/HealthGen.git.
Navigate to the /healthgen directory and install the dependencies by running: pip install requirements.txt.
Add the HealthGen module to your PYTHONPATH by running export PYTHONPATH=$PYTHONPATH:/path/to/HealthGen/healthgen.
Optionally, setup wandb, a useful tool for experiment tracking, which is integrated into our pipeline. After setting up a free account, add your credentials and the desired project name for the placeholders wandb_user and wandb_project in the code.

Data Access

We utilize the MIMIC-III data set for the training and evaluation of our generative model, which is publicly available to credentialed users.

To extract an intermediate representation of the EHR time series data, we utilize a slightly modified version of MIMIC-Extract, which is automatically cloned if you followed the instructions for installation. To extract the intermediate tables of the data required for our pipeline, follow the steps 1-4 in the instructions of MIMIC-Extract. In addition to the standard flags, you can set the sampling frequency (e.g. to 15 minutes) by calling: python mimic_direct_extract.py --time_step 15 ...

After the extraction has finished (extracting all patients can take several hours on a machine with around 50 GB of memory), you should obtain four tables with the extracted patient data. This is the input data for our experimental pipeline.

Use

The main components of the pipeline can be run independently: data querying and processing from the database, training a generative model, and evaluation.

To run the entire experimental pipeline, i.e. extract the time series from the intermediate tables, train a generative model and run the resulting evaluation, run:

main.py 
--input_vitals /path/to/vitals/table 
--input_outcomes /path/to/outcomes/table
--input_static /path/to/static/table
--gen_model healthgen
--evaluation grud
--out_path /path/to/save/results

For more information on all available flags, run main.py --helpfull, and see the comments in the code for additional information.

License

MIT License

Authors

Simon Bing, Andrea Dittadi, Stefan Bauer, Patrick Schwab

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

Related tags

Overview

HealthGen: Conditional EHR Time Series Generation

Installation

Data Access

Use

License

Authors

Owner

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Realtime micro-expression recognition using OpenCV and PyTorch

PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Code accompanying paper: Meta-Learning to Improve Pre-Training

Video-based open-world segmentation

Minimal PyTorch implementation of YOLOv3

City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

POT : Python Optimal Transport

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).