This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Overview

Introduction: X-Ray Report Generation

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". Our work adopts x-ray (also including some history data for patients if there are any) as input, a CNN is used to learn the embedding features for x-ray, as a result, disease-state-style information (Previously, almost all work used detected disease embedding for input of text generation network which could possibly exclude the false negative diseases) is extracted and fed into the text generation network (transformer). To make sure the consistency of detected diseases and generated x-ray reports, we also create a interpreter to enforce the accuracy of the x-ray reports. For details, please refer to here.

Data we used for experiments

We use two datasets for experiments to validate our method:

Performance on two datasets

Datasets Methods BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE-L
Open-I Single-view 0.463 0.310 0.215 0.151 0.186 0.377
Multi-view 0.476 0.324 0.228 0.164 0.192 0.379
Multi-view w/ Clinical History 0.485 0.355 0.273 0.217 0.205 0.422
Full Model (w/ Interpreter) 0.515 0.378 0.293 0.235 0.219 0.436
MIMIC Single-view 0.447 0.290 0.200 0.144 0.186 0.317
Multi-view 0.451 0.292 0.201 0.144 0.185 0.320
Multi-view w/ Clinical History 0.491 0.357 0.276 0.223 0.213 0.389
Full Model (w/ Interpreter) 0.495 0.360 0.278 0.224 0.222 0.390

Environments for running codes

  • Operating System: Ubuntu 18.04

  • Hardware: tested with RTX 2080 TI (11G)

  • Software: tested with PyTorch 1.5.1, Python3.7, CUDA 10.0, tensorboardX, tqdm

  • Anaconda is strongly recommended

  • Other Libraries: Spacy, SentencePiece, nlg-eval

How to use our code for train/test

Step 0: Build your vocabulary model with SentencePiece (tools/vocab_builder.py)

  • Please make sure that you have preprocess the medical reports accurately.
  • We use the top 900 high-frequency words
  • We use 100 unigram tokens extracted from SentencePiece to avoid the out-of-vocabulary situation.
  • In total we have 1000 words and tokens. Update: You can skip step 0 and use the vocabulary files in Vocabulary/*.model

Step 1: Train the LSTM and/or Transformer models, which are just text classifiers, to obtain 14 common disease labels.

  • Use the train_text.py to train the models on your working datasets. For example, the MIMIC-CXR comes with CheXpert labels; you can use these labels as ground-truth to train a differentiable text classifier model. Here the text classifier is a binary predictor (postive/uncertain) = 1 and (negative/unmentioned) = 0.
  • Assume the trained text classifier is perfect and exactly reflects the medical reports. Although this is not the case, in practice, it gives us a good approximation of how good the generated reports are. Human evaluation is also needed to evalutate the generated reports.
  • The goals here are:
  1. Evaluate the performance of the generated reports by comparing the predicted labels and the ground-truth labels.
  2. Use the trained models to fine-tune medical reports' output.

Step 2: Test the text classifier models using the train_text.py with:

  • PHASE = 'TEST'
  • RELOAD = True --> Load the trained models for testing

Step 3: Transfer the trained model to obtain 14 common disease labels for the Open-I datasets and any dataset that doesn't have ground-truth labels.

  • Transfer the learned model to the new dataset by predicting 14 disease labels for the entire dataset by running extract_label.py on the target dataset. The output file is file2label.json
  • Split them into train, validation, and test sets (we have already done that for you, just put the file2label.json in a place where the NLMCXR dataset can see).
  • Build your own text classifier (train_text.py) based on the extracted disease labels (treat them as ground-truth labels).
  • In the end, we want the text classifiers (LSTM/Transformer) to best describe your model's output on the working dataset.

Step 4: Get additional labels using (tools/count_nounphrases.py)

  • Note that 14 disease labels are not enough to generate accurate reports. This is because for the same disease, we might have different ways to express it. For this reason, additional labels are needed to enhance the quality of medical reports.
  • The output of the coun_nounphrases.py is a json file, you can use it as input to the exising datasets such as MIMIC or NLMCXR.
  • Therefore, in total we have 14 disease labels + 100 noun-phrases = 114 disease-related topics/labels. Please check the appendix in our paper.

Step 5: Train the ClsGen model (Classifier-Generator) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = False --> We trained our model from scratch

Step 6: Train the ClsGenInt model (Classifier-Generator-Interpreter) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = True --> Load the ClsGen trained from the step 4, load the Interpreter model from Step 1 or 3
  • Reduce the learning rate --> Since the ClsGen has already converged, we need to reduce the learning rate to fine-tune the word representation such that it minimize the interpreter error.

Step 7: Generate the outputs

  • Use the infer function in the train_full.py to generate the outputs. This infer function ensures that no ground-truth labels and medical reports are being used in the inference phase (we used teacher forcing / ground-truth labels during training phase).
  • Also specify the threshold parameter, see the appendix of our paper on which threshold to choose from.
  • Final specify your the name of your output files.

Step 8: Evaluate the generated reports.

  • Use the trained text classifier model in step 1 to evaluate the clinical accuracy
  • Use the nlg-eval library to compute BLEU-1 to BLEU-4 scores and other metrics.

Our pretrained models

Our model is uploaded in google drive, please download the model from

Model Name Download Link
Our Model for MIMIC Google Drive
Our Model for NLMCXR Google Drive

Citation

If it is helpful to you, please cite our work:

@inproceedings{nguyen-etal-2021-automated,
    title = "Automated Generation of Accurate {\&} Fluent Medical {X}-ray Reports",
    author = "Nguyen, Hoang  and
      Nie, Dong  and
      Badamdorj, Taivanbat  and
      Liu, Yujie  and
      Zhu, Yingying  and
      Truong, Jason  and
      Cheng, Li",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.288",
    doi = "10.18653/v1/2021.emnlp-main.288",
    pages = "3552--3569",
}

Owner
no name
no name
Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

71 Nov 25, 2022
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

Jongheon Jeong 174 Dec 29, 2022
TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

One model to speak them all 🌎 Audio Language Text ▷ Chinese 人人生而自由,在尊严和权利上一律平等。 ▷ English All human beings are born free and equal in dignity and rig

Mutian He 60 Nov 14, 2022
Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Using fully convolutional networks for semantic segmentation (Shelhamer et al.) with caffe for the cityscapes dataset How to get started Download the

Simon Guist 27 Jun 06, 2022
Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

Theresa Wagner 1 Aug 10, 2022
WarpRNNT loss ported in Numba CPU/CUDA for Pytorch

RNNT loss in Pytorch - Numba JIT compiled (warprnnt_numba) Warp RNN Transducer Loss for ASR in Pytorch, ported from HawkAaron/warp-transducer and a re

Somshubra Majumdar 15 Oct 22, 2022
Project to create an open-source 6 DoF input device

6DInputs A Project to create open-source 3D printed 6 DoF input devices Note the plural ('6DInputs' and 'devices') in the headings. We would like seve

RepRap Ltd 47 Jul 28, 2022
deep_image_prior_extension

Code for "Is Deep Image Prior in Need of a Good Education?" Project page: https://jleuschn.github.io/docs.educated_deep_image_prior/. Supplementary Ma

riccardo barbano 7 Jan 09, 2022
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
PyTorch Lightning implementation of Automatic Speech Recognition

lasr Lightening Automatic Speech Recognition An MIT License ASR research library, built on PyTorch-Lightning, for developing end-to-end ASR models. In

Soohwan Kim 40 Sep 19, 2022
1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task

KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a

MEGVII Research 24 Sep 08, 2022
Pytorch implementation for RelTransformer

RelTransformer Our Architecture This is a Pytorch implementation for RelTransformer The implementation for Evaluating on VG200 can be found here Requi

Vision CAIR Research Group, KAUST 21 Nov 22, 2022
Utilities to bridge Canvas-generated course rosters with GitLab's API.

gitlab-canvas-utils A collection of scripts originally written for CSE 13S. Oversees everything from GitLab course group creation, student repository

Eugene Chou 5 Jun 08, 2022
Prototype python implementation of the ome-ngff table spec

Prototype python implementation of the ome-ngff table spec

Kevin Yamauchi 8 Nov 20, 2022
[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

Bowen Wen 199 Jan 04, 2023
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

12 Dec 05, 2021
Code for the paper: "On the Bottleneck of Graph Neural Networks and Its Practical Implications"

On the Bottleneck of Graph Neural Networks and its Practical Implications This is the official implementation of the paper: On the Bottleneck of Graph

75 Dec 22, 2022
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

ARES This repository contains the code for ARES (Adversarial Robustness Evaluation for Safety), a Python library for adversarial machine learning rese

Tsinghua Machine Learning Group 377 Dec 20, 2022