Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

Overview

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology, LMRL Workshop, NeurIPS 2021. [Workshop] [arXiv]
Richard. J. Chen, Rahul G. Krishnan
@article{chen2022self,
  title={Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology},
  author={Chen, Richard J and Krishnan, Rahul G},
  journal={Learning Meaningful Representations of Life, NeurIPS 2021},
  year={2021}
}
DINO illustration

Summary / Main Findings:

  1. In head-to-head comparison of SimCLR versus DINO, DINO learns more effective pretrained representations for histopathology - likely due to 1) not needing negative samples (histopathology has lots of potential class imbalance), 2) capturing better inductive biases about the part-whole hierarchies of how cells are spatially organized in tissue.
  2. ImageNet features do lag behind SSL methods (in terms of data-efficiency), but are better than you think on patch/slide-level tasks. Transfer learning with ImageNet features (from a truncated ResNet-50 after 3rd residual block) gives very decent performance using the CLAM package.
  3. SSL may help mitigate domain shift from site-specific H&E stainining protocols. With vanilla data augmentations, global structure of morphological subtypes (within each class) are more well-preserved than ImageNet features via 2D UMAP scatter plots.
  4. Self-supervised ViTs are able to localize cell location quite well w/o any supervision. Our results show that ViTs are able to localize visual concepts in histopathology in introspecting the attention heads.

Updates

Stay tuned for more updates :).

  • TBA: Pretrained SimCLR and DINO models on TCGA-Lung (Larger working paper, in submission).
  • TBA: Pretrained SimCLR and DINO models on TCGA-PanCancer (Larger working paper, in submission).
  • TBA: PEP8-compliance (cleaning and organizing code).
  • 03/04/2022: Reproducible and largely-working codebase that I'm satisfied with and have heavily tested.

Pre-Reqs

We use Git LFS to version-control large files in this repository (e.g. - images, embeddings, checkpoints). After installing, to pull these large files, please run:

git lfs pull

Pretrained Models

SIMCLR and DINO models were trained for 100 epochs using their vanilla training recipes in their respective papers. These models were developed on 2,055,742 patches (256 x 256 resolution at 20X magnification) extracted from diagnostic slides in the TCGA-BRCA dataset, and evaluated via K-NN on patch-level datasets in histopathology.

Note: Results should be taken-in w.r.t. to the size of dataset and duraration of training epochs. Ideally, longer training with larger batch sizes would demonstrate larger gains in SSL performance.

Arch SSL Method Dataset Epochs Dim K-NN Download
ResNet-50 Transfer ImageNet N/A 1024 0.935 N/A
ResNet-50 SimCLR TCGA-BRCA 100 2048 0.938 Backbone
ViT-S/16 DINO TCGA-BRCA 100 384 0.941 Backbone

Data Download + Data Preprocessing

For CRC-100K and BreastPathQ, pre-extracted embeddings are already available and processed in ./embeddings_patch_library. See patch_extraction_utils.py on how these patch datasets were processed.

Additional Datasets + Custom Implementation: This codebase is flexible for feature extraction on a variety of different patch datasets. To extend this work, simply modify patch_extraction_utils.py with a custom Dataset Loader for your dataset. As an example, we include BCSS (results not yet updated in this work).

  • BCSS (v1): You can download the BCSS dataset from the official Grand Challenge link. For this dataset, we manually developed the train and test dataset splits and labels using majority-voting. Reproducibility for the raw BCSS dataset may be not exact, but we include the pre-extracted embeddings of this dataset in ./embeddings_patch_library (denoted as version 1).

Evaluation: K-NN Patch-Level Classification on CRC-100K + BreastPathQ

Run the notebook patch_extraction.ipynb, followed by patch_evaluation.ipynb. The evaluation notebook should run "out-of-the-box" with Git LFS.

table2

Evaluation: Slide-Level Classification on TCGA-BRCA (IDC versus ILC)

Install the CLAM Package, followed by using the 10-fold cross-validation splits made available in ./slide_evaluation/10foldcv_subtype/tcga_brca. Tensorboard train + validation logs can visualized via:

tensorboard --logdir ./slide_evaluation/results/
table1

Visualization: Creating UMAPs

Install umap-learn (can be tricky to install if you have incompatible dependencies), followed by using the following code snippet in patch_extraction_utils.py, and is used in patch_extraction.ipynb to create Figure 4.

UMAP

Visualization: Attention Maps

Attention visualizations (reproducing Figure 3) can be performed via walking through the following notebook at attention_visualization_256.ipynb.

Attention Visualization

Issues

  • Please open new threads or report issues directly (for urgent blockers) to [email protected].
  • Immediate response to minor issues may not be available.

Acknowledgements, License & Usage

  • Part of this work was performed while at Microsoft Research. We thank the BioML group at Microsoft Research New England for their insightful feedback.
  • This work is still under submission in a formal proceeding. Still, if you found our work useful in your research, please consider citing our paper at:
@article{chen2022self,
  title={Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology},
  author={Chen, Richard J and Krishnan, Rahul G},
  journal={Learning Meaningful Representations of Life, NeurIPS 2021},
  year={2021}
}

© This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

Owner
Richard Chen
Ph.D. Candidate at Harvard
Richard Chen
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
An end-to-end machine learning library to directly optimize AUC loss

LibAUC An end-to-end machine learning library for AUC optimization. Why LibAUC? Deep AUC Maximization (DAM) is a paradigm for learning a deep neural n

Andrew 75 Dec 12, 2022
2D&3D human pose estimation

Human Pose Estimation Papers [CVPR 2016] - 201511 [IJCAI 2016] - 201602 Other Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors

133 Jan 02, 2023
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
Catbird is an open source paraphrase generation toolkit based on PyTorch.

Catbird is an open source paraphrase generation toolkit based on PyTorch. Quick Start Requirements and Installation The project is based on PyTorch 1.

Afonso Salgado de Sousa 5 Dec 15, 2022
Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes This repository is the official implementation of Us

Damien Bouchabou 0 Oct 18, 2021
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees This repository is the official implementation of the empirica

Kuan-Lin (Jason) Chen 2 Oct 02, 2022
🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series

🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series (optical and radar) The PASTIS Dataset Dataset presentation PASTIS is a benchmark dataset for

86 Jan 04, 2023
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL) This repository is for Zero-shot Natural Languag

Computer Vision Lab. @ GIST 37 Dec 27, 2022
Using PyTorch Perform intent classification using three different models to see which one is better for this task

Using PyTorch Perform intent classification using three different models to see which one is better for this task

Yoel Graumann 1 Feb 14, 2022
Malmo Collaborative AI Challenge - Team Pig Catcher

The Malmo Collaborative AI Challenge - Team Pig Catcher Approach The challenge involves 2 agents who can either cooperate or defect. The optimal polic

Kai Arulkumaran 66 Jun 29, 2022
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

52 Dec 23, 2022
Unsupervised Representation Learning by Invariance Propagation

Unsupervised Learning by Invariance Propagation This repository is the official implementation of Unsupervised Learning by Invariance Propagation. Pre

FengWang 15 Jul 06, 2022
Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Convolutional Recurrent Neural Network + CTCLoss | STAR-Net Code for paper "Towards Boosting the Accuracy of Non-Latin Scene Text Recognition" Depende

Sanjana Gunna 7 Aug 07, 2022
Background-Click Supervision for Temporal Action Localization

Background-Click Supervision for Temporal Action Localization This repository is the official implementation of BackTAL. In this work, we study the te

LeYang 221 Oct 09, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022