This repository provides the code for MedViLL(Medical Vision Language Learner).

Last update: Jan 05, 2023

Related tags

Overview

MedViLL

This repository provides the code for MedViLL(Medical Vision Language Learner).

Our proposed architecture MedViLL is a single BERT-based model that learns unified contextualized vision-language (VL) representation for both Vision Language Understanding (VLU) and Vision Language Generation (VLG). MedViLL performs pre-training with a CNN-based visual encoder and a cross-modal Transformer for VL joint representation learning. After pre-training, our model can be easily used for VLU and VLG tasks with task-specific finetuning. Please refer to our paper "Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training" for more details.

1) Downloads.

Pre-trained weights.

We provide five versions of BERT-based pre-trained weights with different types of self-attention masks. Pre-training for the joint embedding was built on the BERT-base architecutre(12 hidden layers, 12 attention heads, 768 hidden size), and training details are described in our paper. Currently avaliable versions of pre-trained weights are as follows:

MedViLL - BERT-Base model with Bidirectional Auto-regressive attention mask.
Bi & Seq2Seq - BERT-Base model with Seq2Seq attention mask(75%) and Bidirectional attention mask(25%) in every mini-batch.
Bidirectional - BERT-Base model with Bidirectional attention mask.
Seq2Seq - BERT-Base model with Seq2Seq attention mask.
Non-cross - BERT-Base model with Non-cross modality attention mask.

Datasets.

We provide a pre-processed version of multiple datasets for each task as follows:

Download each dataset to the path /data/[dataset].

MIMIC-CXR (2.27 GB): Unique study of 91,685 AP view image and associated report pairs.
OPEN-I (74.1 MB): Unique study of 3,547 AP and PA image-report pairs from the official Open-I dataset.
VQA-RAD (402 MB): 3,515 question answer pairs on 315 images (104 head CTs or MRIs, 107 Chest X-rays, and 104 abdominal CTs).

We also provide the JSON file with the path for validation in the retrieval task, download each files to the path /data/[dataset]. Image to report retrieval

MIMIC valid, 2) MIMIC test, 3) OpenI test

Report to Image retrieval

MIMIC valid, 2) MIMIC test, 3) OpenI test

2) Reproduce.

Section A. Installation

Sections below describe the virtual env installation and the fine-training process of MedviLL based on pytorch version 1.7, python version 3.8. To fine-tune MedViLL, you need to download the pre-trained weights of MedViLL. After downloading the pre-trained weights, use medvill.yaml to install conda based virtual env as follows:

$ git clone https://github.com/SuperSupermoon/MedViLL.git
$ cd MedViLL; conda env create --file medvill.yaml

Note that all fine-tuning models were conducted on 8 Geforce RTX-3090 GPU machines, each of which has 24GB of VRAM.

Section B. Prepare pre-processed dataset

Unzip mimic, openi, and VQA-RAD tar.gz files.

$ cd MedViLL; tar -zxvf [file_name.tar.gz]

Section C. Pre-training model

Example:

$ cd MedViLL
$ python main.py

Section D. Downstream model

Diagnosis Classification Example:

$ cd MedViLL/downstream_task/classification
$ python cls.py

Image-Report Retrieval Example:

$ cd MedViLL/downstream_task/retrieval
$ python retrieval.py

Medical Visual Qestion Answering Example:

$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks vqa --s2s_prob 0 --bi_prob 1 --mask_prob 0

Report Generation Example:

$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks report_generation --mask_prob 0.15 --s2s_prob 1 --bi_prob 0

This repository provides the code for MedViLL(Medical Vision Language Learner).

Related tags

Overview

MedViLL

1) Downloads.

Pre-trained weights.

Datasets.

2) Reproduce.

Section A. Installation

Section B. Prepare pre-processed dataset

Section C. Pre-training model

Section D. Downstream model

Owner

SuperSuperMoon

Modeling CNN layers activity with Gaussian mixture model

TensorFlow implementation of "Attention is all you need (Transformer)"

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

A PyTorch-based library for semi-supervised learning

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

DuBE: Duple-balanced Ensemble Learning from Skewed Data

Adds timm pretrained backbone to pytorch's FasterRcnn model

A program that uses computer vision to detect hand gestures, used for controlling movie players.

Hcpy - Interface with Home Connect appliances in Python

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required.