Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Overview

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Introduction

This is an implementation of the model used for breast cancer classification as described in our paper Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening. The implementation allows users to get breast cancer predictions by applying one of our pretrained models: a model which takes images as input (image-only) and a model which takes images and heatmaps as input (image-and-heatmaps).

  • Input images: 2 CC view mammography images of size 2677x1942 and 2 MLO view mammography images of size 2974x1748. Each image is saved as 16-bit png file and gets standardized separately before being fed to the models.
  • Input heatmaps: output of the patch classifier constructed to be the same size as its corresponding mammogram. Two heatmaps are generated for each mammogram, one for benign and one for malignant category. The value of each pixel in both of them is between 0 and 1.
  • Output: 2 predictions for each breast, probability of benign and malignant findings: left_benign, right_benign, left_malignant, and right_malignant.

Both models act on screening mammography exams with four standard views (L-CC, R-CC, L-MLO, R-MLO). As a part of this repository, we provide 4 sample exams (in sample_data/images directory and exam list stored in sample_data/exam_list_before_cropping.pkl). Heatmap generation model and cancer classification models are implemented in PyTorch.

Update (2019/10/26): Our paper will be published in the IEEE Transactions on Medical Imaging!

Update (2019/08/26): We have added a TensorFlow implementation of our image-wise model.

Update (2019/06/21): We have included the image-wise model as described in the paper that generates predictions based on a single mammogram image. This model slightly under-performs the view-wise model used above, but can be used on single mammogram images as opposed to full exams.

Update (2019/05/15): Fixed a minor bug that caused the output DataFrame columns (left_malignant, right_benign) to be swapped. Note that this does not affect the operation of the model.

Prerequisites

  • Python (3.6)
  • PyTorch (0.4.1)
  • torchvision (0.2.0)
  • NumPy (1.14.3)
  • SciPy (1.0.0)
  • H5py (2.7.1)
  • imageio (2.4.1)
  • pandas (0.22.0)
  • tqdm (4.19.8)
  • opencv-python (3.4.2)

License

This repository is licensed under the terms of the GNU AGPLv3 license.

How to run the code

Exam-level

Here we describe how to get predictions from view-wise model, which is our best-performing model. This model takes 4 images from each view as input and outputs predictions for each exam.

bash run.sh

will automatically run the entire pipeline and save the prediction results in csv.

We recommend running the code with a gpu (set by default). To run the code with cpu only, please change DEVICE_TYPE in run.sh to 'cpu'.

If running the individual Python scripts, please include the path to this repository in your PYTHONPATH .

You should obtain the following outputs for the sample exams provided in the repository.

Predictions using image-only model (found in sample_output/image_predictions.csv by default):

index left_benign right_benign left_malignant right_malignant
0 0.0580 0.0754 0.0091 0.0179
1 0.0646 0.9536 0.0012 0.7258
2 0.4388 0.3526 0.2325 0.1061
3 0.3765 0.6483 0.0909 0.2579

Predictions using image-and-heatmaps model (found in sample_output/imageheatmap_predictions.csv by default):

index left_benign right_benign left_malignant right_malignant
0 0.0612 0.0555 0.0099 0.0063
1 0.0507 0.8025 0.0009 0.9000
2 0.2877 0.2286 0.2524 0.0461
3 0.4181 0.3172 0.3174 0.0485

Single Image

Here we also upload image-wise model, which is different from and performs worse than the view-wise model described above. The csv output from view-wise model will be different from that of image-wise model in this section. Because this model has the benefit of creating predictions for each image separately, we make this model public to facilitate transfer learning.

To use the image-wise model, run a command such as the following:

bash run_single.sh "sample_data/images/0_L_CC.png" "L-CC"

where the first argument is path to a mammogram image, and the second argument is the view corresponding to that image.

You should obtain the following output based on the above example command:

Stage 1: Crop Mammograms
Stage 2: Extract Centers
Stage 3: Generate Heatmaps
Stage 4a: Run Classifier (Image)
{"benign": 0.040191903710365295, "malignant": 0.008045293390750885}
Stage 4b: Run Classifier (Image+Heatmaps)
{"benign": 0.052365876734256744, "malignant": 0.005510155577212572}

Image-level Notebook

We have included a sample notebook that contains code for running the classifiers with and without heatmaps (excludes preprocessing).

Data

To use one of the pretrained models, the input is required to consist of at least four images, at least one for each view (L-CC, L-MLO, R-CC, R-MLO).

The original 12-bit mammograms are saved as rescaled 16-bit images to preserve the granularity of the pixel intensities, while still being correctly displayed in image viewers.

sample_data/exam_list_before_cropping.pkl contains a list of exam information before preprocessing. Each exam is represented as a dictionary with the following format:

{
  'horizontal_flip': 'NO',
  'L-CC': ['0_L_CC'],
  'R-CC': ['0_R_CC'],
  'L-MLO': ['0_L_MLO'],
  'R-MLO': ['0_R_MLO'],
}

We expect images from L-CC and L-MLO views to be facing right direction, and images from R-CC and R-MLO views are facing left direction. horizontal_flip indicates whether all images in the exam are flipped horizontally from expected. Values for L-CC, R-CC, L-MLO, and R-MLO are list of image filenames without extension and directory name.

Additional information for each image gets included as a dictionary. Such dictionary has all 4 views as keys, and the values are the additional information for the corresponding key. For example, window_location, which indicates the top, bottom, left and right edges of cropping window, is a dictionary that has 4 keys and has 4 lists as values which contain the corresponding information for the images. Additionally, rightmost_pixels, bottommost_pixels, distance_from_starting_side and best_center are added after preprocessing. Description for these attributes can be found in the preprocessing section. The following is an example of exam information after cropping and extracting optimal centers:

{
  'horizontal_flip': 'NO',
  'L-CC': ['0_L_CC'],
  'R-CC': ['0_R_CC'],
  'L-MLO': ['0_L_MLO'],
  'R-MLO': ['0_R_MLO'],
  'window_location': {
    'L-CC': [(353, 4009, 0, 2440)],
    'R-CC': [(71, 3771, 952, 3328)],
    'L-MLO': [(0, 3818, 0, 2607)],
    'R-MLO': [(0, 3724, 848, 3328)]
   },
  'rightmost_points': {
    'L-CC': [((1879, 1958), 2389)],
    'R-CC': [((2207, 2287), 2326)],
    'L-MLO': [((2493, 2548), 2556)],
    'R-MLO': [((2492, 2523), 2430)]
   },
  'bottommost_points': {
    'L-CC': [(3605, (100, 100))],
    'R-CC': [(3649, (101, 106))],
    'L-MLO': [(3767, (1456, 1524))],
    'R-MLO': [(3673, (1164, 1184))]
   },
  'distance_from_starting_side': {
    'L-CC': [0],
    'R-CC': [0],
    'L-MLO': [0],
    'R-MLO': [0]
   },
  'best_center': {
    'L-CC': [(1850, 1417)],
    'R-CC': [(2173, 1354)],
    'L-MLO': [(2279, 1681)],
    'R-MLO': [(2185, 1555)]
   }
}

The labels for the included exams are as follows:

index left_benign right_benign left_malignant right_malignant
0 0 0 0 0
1 0 0 0 1
2 1 0 0 0
3 1 1 1 1

Pipeline

The pipeline consists of four stages.

  1. Crop mammograms
  2. Calculate optimal centers
  3. Generate Heatmaps
  4. Run classifiers

The following variables defined in run.sh can be modified as needed:

  • NUM_PROCESSES: The number of processes to be used in preprocessing (src/cropping/crop_mammogram.py and src/optimal_centers/get_optimal_centers.py). Default: 10.

  • DEVICE_TYPE: Device type to use in heatmap generation and classifiers, either 'cpu' or 'gpu'. Default: 'gpu'

  • NUM_EPOCHS: The number of epochs to be averaged in the output of the classifiers. Default: 10.

  • HEATMAP_BATCH_SIZE: The batch size to use in heatmap generation. Default: 100.

  • GPU_NUMBER: Specify which one of the GPUs to use when multiple GPUs are available. Default: 0.

  • DATA_FOLDER: The directory where the mammogram is stored.

  • INITIAL_EXAM_LIST_PATH: The path where the initial exam list without any metadata is stored.

  • PATCH_MODEL_PATH: The path where the saved weights for the patch classifier is saved.

  • IMAGE_MODEL_PATH: The path where the saved weights for the image-only model is saved.

  • IMAGEHEATMAPS_MODEL_PATH: The path where the saved weights for the image-and-heatmaps model is saved.

  • CROPPED_IMAGE_PATH: The directory to save cropped mammograms.

  • CROPPED_EXAM_LIST_PATH: The path to save the new exam list with cropping metadata.

  • EXAM_LIST_PATH: The path to save the new exam list with best center metadata.

  • HEATMAPS_PATH: The directory to save heatmaps.

  • IMAGE_PREDICTIONS_PATH: The path to save predictions of image-only model.

  • IMAGEHEATMAPS_PREDICTIONS_PATH: The path to save predictions of image-and-heatmaps model.

Preprocessing

Run the following commands to crop mammograms and calculate information about augmentation windows.

Crop mammograms

python3 src/cropping/crop_mammogram.py \
    --input-data-folder $DATA_FOLDER \
    --output-data-folder $CROPPED_IMAGE_PATH \
    --exam-list-path $INITIAL_EXAM_LIST_PATH  \
    --cropped-exam-list-path $CROPPED_EXAM_LIST_PATH  \
    --num-processes $NUM_PROCESSES

src/import_data/crop_mammogram.py crops the mammogram around the breast and discards the background in order to improve image loading time and time to run segmentation algorithm and saves each cropped image to $PATH_TO_SAVE_CROPPED_IMAGES/short_file_path.png using h5py. In addition, it adds additional information for each image and creates a new image list to $CROPPED_IMAGE_LIST_PATH while discarding images which it fails to crop. Optional --verbose argument prints out information about each image. The additional information includes the following:

  • window_location: location of cropping window w.r.t. original dicom image so that segmentation map can be cropped in the same way for training.
  • rightmost_points: rightmost nonzero pixels after correctly being flipped.
  • bottommost_points: bottommost nonzero pixels after correctly being flipped.
  • distance_from_starting_side: records if zero-value gap between the edge of the image and the breast is found in the side where the breast starts to appear and thus should have been no gap. Depending on the dataset, this value can be used to determine wrong value of horizontal_flip.

Calculate optimal centers

python3 src/optimal_centers/get_optimal_centers.py \
    --cropped-exam-list-path $CROPPED_EXAM_LIST_PATH \
    --data-prefix $CROPPED_IMAGE_PATH \
    --output-exam-list-path $EXAM_LIST_PATH \
    --num-processes $NUM_PROCESSES

src/optimal_centers/get_optimal_centers.py outputs new exam list with additional metadata to $EXAM_LIST_PATH. The additional information includes the following:

  • best_center: optimal center point of the window for each image. The augmentation windows drawn with best_center as exact center point could go outside the boundary of the image. This usually happens when the cropped image is smaller than the window size. In this case, we pad the image and shift the window to be inside the padded image in augmentation. Refer to the data report for more details.

Heatmap Generation

python3 src/heatmaps/run_producer.py \
    --model-path $PATCH_MODEL_PATH \
    --data-path $EXAM_LIST_PATH \
    --image-path $CROPPED_IMAGE_PATH \
    --batch-size $HEATMAP_BATCH_SIZE \
    --output-heatmap-path $HEATMAPS_PATH \
    --device-type $DEVICE_TYPE \
    --gpu-number $GPU_NUMBER

src/heatmaps/run_producer.py generates heatmaps by combining predictions for patches of images and saves them as hdf5 format in $HEATMAPS_PATH using $DEVICE_TYPE device. $DEVICE_TYPE can either be 'gpu' or 'cpu'. $HEATMAP_BATCH_SIZE should be adjusted depending on available memory size. An optional argument --gpu-number can be used to specify which GPU to use.

Running the models

src/modeling/run_model.py can provide predictions using cropped images either with or without heatmaps. When using heatmaps, please use the--use-heatmaps flag and provide appropriate the --model-path and --heatmaps-path arguments. Depending on the available memory, the optional argument --batch-size can be provided. Another optional argument --gpu-number can be used to specify which GPU to use.

Run image only model

python3 src/modeling/run_model.py \
    --model-path $IMAGE_MODEL_PATH \
    --data-path $EXAM_LIST_PATH \
    --image-path $CROPPED_IMAGE_PATH \
    --output-path $IMAGE_PREDICTIONS_PATH \
    --use-augmentation \
    --num-epochs $NUM_EPOCHS \
    --device-type $DEVICE_TYPE \
    --gpu-number $GPU_NUMBER

This command makes predictions only using images for $NUM_EPOCHS epochs with random augmentation and outputs averaged predictions per exam to $IMAGE_PREDICTIONS_PATH.

Run image+heatmaps model

python3 src/modeling/run_model.py \
    --model-path $IMAGEHEATMAPS_MODEL_PATH \
    --data-path $EXAM_LIST_PATH \
    --image-path $CROPPED_IMAGE_PATH \
    --output-path $IMAGEHEATMAPS_PREDICTIONS_PATH \
    --use-heatmaps \
    --heatmaps-path $HEATMAPS_PATH \
    --use-augmentation \
    --num-epochs $NUM_EPOCHS \
    --device-type $DEVICE_TYPE \
    --gpu-number $GPU_NUMBER

This command makes predictions using images and heatmaps for $NUM_EPOCHS epochs with random augmentation and outputs averaged predictions per exam to $IMAGEHEATMAPS_PREDICTIONS_PATH.

Getting image from dicom files and saving as 16-bit png files

Dicom files can be converted into png files with the following function, which then can be used by the code in our repository (pypng 0.0.19 and pydicom 1.2.2 libraries are required).

import png
import pydicom

def save_dicom_image_as_png(dicom_filename, png_filename, bitdepth=12):
    """
    Save 12-bit mammogram from dicom as rescaled 16-bit png file.
    :param dicom_filename: path to input dicom file.
    :param png_filename: path to output png file.
    :param bitdepth: bit depth of the input image. Set it to 12 for 12-bit mammograms.
    """
    image = pydicom.read_file(dicom_filename).pixel_array
    with open(png_filename, 'wb') as f:
        writer = png.Writer(height=image.shape[0], width=image.shape[1], bitdepth=bitdepth, greyscale=True)
        writer.write(f, image.tolist())

Reference

If you found this code useful, please cite our paper:

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung, Esther Hwang, Naziya Samreen, S. Gene Kim, Laura Heacock, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
IEEE Transactions on Medical Imaging
2019

@article{wu2019breastcancer, 
    title = {Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening},
    author = {Nan Wu and Jason Phang and Jungkyu Park and Yiqiu Shen and Zhe Huang and Masha Zorin and Stanis\l{}aw Jastrz\k{e}bski and Thibault F\'{e}vry and Joe Katsnelson and Eric Kim and Stacey Wolfson and Ujas Parikh and Sushma Gaddam and Leng Leng Young Lin and Kara Ho and Joshua D. Weinstein and Beatriu Reig and Yiming Gao and Hildegard Toth and Kristine Pysarenko and Alana Lewin and Jiyon Lee and Krystal Airola and Eralda Mema and Stephanie Chung and Esther Hwang and Naziya Samreen and S. Gene Kim and Laura Heacock and Linda Moy and Kyunghyun Cho and Krzysztof J. Geras}, 
    journal = {IEEE Transactions on Medical Imaging},
    year = {2019}
}
Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021) Official Pytorch implementation of Unbiased Classification

Youngkyu 17 Jan 01, 2023
Weight initialization schemes for PyTorch nn.Modules

nninit Weight initialization schemes for PyTorch nn.Modules. This is a port of the popular nninit for Torch7 by @kaixhin. ##Update This repo has been

Alykhan Tejani 69 Jan 26, 2021
Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

Mikaela Uy 38 Oct 18, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

GafferHaven Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs are supported at the moment, in Cycles and Arnold lights.

Jakub Vondra 6 Jan 26, 2022
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Z

Yongming Rao 321 Dec 27, 2022
BTC-Generator - BTC Generator With Python

Что такое BTC-Generator? Это генератор чеков всеми любимого @BTC_BANKER_BOT Для

DoomGod 3 Aug 24, 2022
A simple Python configuration file operator.

A simple Python configuration file operator This project provides a common way to read configurations using config42. Installation It is possible to i

Scott Lau 2 Nov 08, 2021
Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*. The algorithm was extremely

1 Mar 28, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch = 1.0 torchvision = 0.2.0 Python 3 Environm

15 Apr 04, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
In generative deep geometry learning, we often get many obj files remain to be rendered

a python prompt cli script for blender batch render In deep generative geometry learning, we always get many .obj files to be rendered. Our rendered i

Tian-yi Liang 1 Mar 20, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

Naiyuan Liu 232 Dec 29, 2022
Text-Based Ideal Points

Text-Based Ideal Points Source code for the paper: Text-Based Ideal Points by Keyon Vafa, Suresh Naidu, and David Blei (ACL 2020). Update (June 29, 20

Keyon Vafa 37 Oct 09, 2022
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 219 Dec 28, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

DeepMind 273 Jan 06, 2023
Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022