Joint learning of images and text via maximization of mutual information

Last update: Dec 22, 2022

Related tags

Overview

mutual_info_img_txt

Joint learning of images and text via maximization of mutual information.

This repository incorporates the algorithms presented in
Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M Wells. Multimodal Representation Learning via Maximization of Local Mutual Information. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021.

This repo is a work-in-progress. As of now, we have released the code for joint representation learning of images and text by maximizing the mutual information between the feature embeddings of the two modalities. We demonstrate its application in learning from chest radiographs and radiology reports.

Instructions

Conda environment

Set up the conda environment using conda_environment.yml:

conda env create -f conda_environment.yml

BERT

Download the pre-trained BERT model, tokenizer, etc. from Dropbox. You should download the folder bert_pretrain_all_notes_150000 that contains seven files. The path to bert_pretrain_all_notes_150000 should be passed to --bert_pretrained_dir.

Model training

Train the model in an unsupervised fashion, i.e., optimizing Eq (2):

python train_img_txt.py

When you run model training for the first time, it may take a while to tokenize the text. Afterwards, this process won't be repeated and the tokenized data will be saved for reuse.

Notes on Data

MIMIC-CXR

We have experimented this algorithm on MIMIC-CXR, which is a large publicly available dataset of chest x-ray images with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA.

Example data

We provide 16 example image-text pairs to test the code, listed in training_chexpert_mini.csv.

Contact

Ruizhi (Ray) Liao: ruizhi [at] mit.edu

Joint learning of images and text via maximization of mutual information

Related tags

Overview

mutual_info_img_txt

Instructions

Conda environment

BERT

Model training

Notes on Data

MIMIC-CXR

Example data

Contact

Owner

Ruizhi Liao

Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Synthetic LiDAR sequential point cloud dataset with point-wise annotations

Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

3D position tracking for soccer players with multi-camera videos

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

links and status of cool gradio demos

Nvidia Semantic Segmentation monorepo

Probabilistic Gradient Boosting Machines

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

PAWS 🐾 Predicting View-Assignments with Support Samples

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

Unadversarial Examples: Designing Objects for Robust Vision

A TensorFlow implementation of the Mnemonic Descent Method.

Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs