DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

This repository contains the code for DABS, a benchmark for domain-agnostic self-supervised learning algorithms. The basic components of the benchmark can be found in datasets, encoders, and algorithms. Training is implemented with the PyTorch Lightning framework, logging with Weights and Biases, and configuration management with Hydra.

Usage

We provide support for Python >= 3.7. Install requirements with

python -m pip install -r requirements.txt

For instructions on how to install PyTorch versions compatible with your CUDA versions, see pytorch.org.

Datasets

We provide a set of dataset implementations (in src/datasets) from image, text, speech, sensor, medical imaging, and image-text domains. Preprocessing operations on these datasets are minimal and hard-coded as simple resizing (i.e. of images) and truncations (i.e. of text, audio). These should not be changed so as to maintain fair comparisons across other users of the benchmark.

See conf/datasets/*.yaml for all dataset configs, including the loss, metrics, and batch size used for each dataset.

Almost all datasets will download automatically when the dataset class is instantiated. The exceptions are the CheXpert, ImageNet, and CU Birds datasets, where manual registration or download is required. See the respective dataset files for specific instructions.

Pretraining Dataset (unlabeled)	Transfer Dataset (labeled)
CIFAR10	Aircraft, CIFAR10, CU Birds, DTD, Traffic Sign, VGG Flower
PAMAP2	PAMAP2
MSCOCO	MSCOCO (mismatched detection), VQA (Binary classification)
Wikitext-103	GLUE (10 Tasks)
mC4	PAWS-X (7 Tasks)
CheXpert	CheXpert (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), ChestX-ray8 (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax)
LibriSpeech	Audio MNIST, Fluent Speech (Action, Object, Location), Google Speech Commands, LibriSpeech, VoxCeleb1

Pretraining

During the pretraining phase, self-supervised encoders are trained to learn good representations from unlabeled data. We currently support seven datasets for pretraining, one for each domain: MS COCO, ImageNet, CheXpert, PAMAP2, mC4, WikiText-103, and LibriSpeech. If the pretraining dataset has associated labels, an online linear evaluator is jointly trained with the encoder to provide a heuristic of transfer performance.

Run pretraining with commands like

python pretrain.py exp.name=<experiment-name> dataset=<dataset> algorithm=<algorithm>

Each dataset and encoder has its own config file, so to train a Transformer on the CheXpert dataset with the e-Mix algorithm, run

python pretrain.py exp.name=emix-chexpert encoder=transformer dataset=chexpert algorithm=emix

See conf/pretrain.yaml for all pretraining configuration fields.

For more information on the datasets, encoders, and algorithms, see the following section.

Pretraining Dataset	Modality	Label type (unused)	Input Type
CIFAR10	Natural images	Single label	2d
PAMAP2	Sensor	Single label	2d
MSCOCO	Captioned images	Single label	2d + tokens
WikiText-103	English Text	No label	tokens
mC4	Multilingual Text	No label	tokens
CheXpert	Medical images	Multi label	2d
LibriSpeech	Speech	No label	2d

Transfer Learning

After pretraining, a small linear classifier is trained on top of the frozen encoder. Run transfer learning from a randomly initialized encoder with

python transfer.py exp.name=<experiment-name> dataset=<dataset> ckpt=null

See conf/transfer.yaml for all transfer learning configuration fields and optionally replace null with the path to your pretrained encoder checkpoint.

Dataset	Modality	Label type	Evaluation metric	Input Type
Aircraft	Natural images	Single label	Accuracy	2d
CU Birds	Natural images	Single label	Accuracy	2d
DTD	Natural images	Single label	Accuracy	2d
Traffic Sign	Natural images	Single label	Accuracy	2d
VGG Flower	Natural images	Single label	Accuracy	2d
Pamap2	Sensor	Single label	Accuracy	2d
MS COCO	Captioned images	Binary label	Accuracy	2d + tokens
VQA	Captioned images	Binary label	Accuracy	2d + tokens
CheXpert	Medical images	Multi label	AUROC	2d
ChestX-ray8	Medical images	Multi label	AUROC	2d
PAWS-X	Multilingual Text	Binary label	Accuracy	tokens
COLA	English Text	Binary label	Pearson correlation	tokens
MNLI Matched	English Text	Single label	Accuracy	tokens
MNLI Mismatched	English Text	Single label	Accuracy	tokens
MRPC	English Text	Binary label	Accuracy	tokens
QNLI	English Text	Binary label	Accuracy	tokens
QQP	English Text	Binary label	Accuracy	tokens
RTE	English Text	Binary label	Accuracy	tokens
SST2	English Text	Binary label	Accuracy	tokens
STSB	English Text	Regression	Spearman correlation	tokens
WNLI	English Text	Binary label	Accuracy	tokens
Audio MNIST	Speech	Single label	Accuracy	2d
Fluent Speech	Speech	Single label	Accuracy	2d
Google Speech Commands	Speech	Single label	Accuracy	2d
LibriSpeech	Speech	Single label	Accuracy	2d
VoxCeleb1	Speech	Single label	Accuracy	2d

Encoders

A domain-agnostic SSL method should have an encoder which remains as constant as possible across domains. We provide a general transformer encoder baseline (in src/encoders). The transformer operates on a sequence of vectors that are produced by a small set of embedding modules (e.g. patch or token embeddings).

Pretraining algorithms

The pretraining algorithm is the framework and objective that the encoder is trained with. Examples of domain-specific algorithms include SimCLR, BYOL, and MoCo, but these are not domain-agnostic methods as they depend on vision-specific augmentations. We provide our own domain-agnostic implementations of recent algorithms, including e-mix (a generalization of i-mix) and Shuffled Embedding Detection (ShED; a generalization of ELECTRA), which randomly permutes a subset of the input embeddings and trains the model to identify the permuted embeddings.

Results

Below are results for algorithms trained on each dataset in DABS. The baseline performance is obtained via a randomly initialized encoder.

Pretrain Dataset	Transfer Dataset	Encoder	Baseline Performance	e-mix Performance	ShED Performance
ImageNet	CIFAR10	Transformer	24.20%	39.43%	39.63%
ImageNet	CU Birds	Transformer	1.62%	3.86%	2.95%
ImageNet	VGG Flowers	Transformer	9.03%	25.96%	13.03%
ImageNet	DTD	Transformer	7.39%	8.83%	18.35%
ImageNet	Traffic Sign	Transformer	14.33%	65.07%	27.51%
ImageNet	Aircraft	Transformer	2.70%	10.15%	5.60%
PAMAP2	PAMAP2	Transformer	69.81%	79.48%	88.69%
MSCOCO	VQA	Transformer	57.50%	48.90%	54.30%
CheXpert	CheXpert	Transformer	68.14%	72.40%	72.40%
CheXpert	ChestX-ray8	Transformer	57.00%	63.00%	63.70%
Wikitext-103	GLUE (average)	Transformer	42.29%	44.08%	48.37%
mC4	PAWS-X (average)	Transformer	58.11%	56.16%	59.91%
LibriSpeech	Audio MNIST	Transformer	33.13%	80.35%	67.33%
LibriSpeech	Fluent Locations	Transformer	62.09%	60.93%	60.24%
LibriSpeech	Fluent Actions	Transformer	26.15%	29.87%	30.53%
LibriSpeech	Fluent Objects	Transformer	30.13%	39.89%	39.36%
LibriSpeech	Google Speech Commands	Transformer	4.87%	19.22%	20.73%
LibriSpeech	LibriSpeech	Transformer	17.12%	60.18%	34.77%
LibriSpeech	VoxCeleb1	Transformer	0.59%	2.43%	2.81%

A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

Usage

Datasets

Pretraining

Transfer Learning

Encoders

Pretraining algorithms

Results

Owner

Alex Tamkin

The implementation for the SportsCap (IJCV 2021)

BEGAN in PyTorch

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

Real-Time High-Resolution Background Matting

Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

Pytorch implementation of Zero-DCE++

A certifiable defense against adversarial examples by training neural networks to be provably robust

Implementation of popular bandit algorithms in batch environments.

DeepCAD: A Deep Generative Network for Computer-Aided Design Models

Hypercomplex Neural Networks with PyTorch

Learning to Prompt for Continual Learning

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

code and models for "Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation"

Neural network for recognizing the gender of people in photos

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

💡 Type hints for Numpy

(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

Usage

Datasets

Pretraining

Transfer Learning

Encoders

Pretraining algorithms

Results

Owner

Alex Tamkin

The implementation for the SportsCap (IJCV 2021)

BEGAN in PyTorch

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

Real-Time High-Resolution Background Matting

Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

Pytorch implementation of Zero-DCE++

A certifiable defense against adversarial examples by training neural networks to be provably robust

Implementation of popular bandit algorithms in batch environments.

DeepCAD: A Deep Generative Network for Computer-Aided Design Models

Hypercomplex Neural Networks with PyTorch

Learning to Prompt for Continual Learning

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

code and models for "Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation"

Neural network for recognizing the gender of people in photos

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

💡 Type hints for Numpy

(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务