A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Deep Learningdabs
Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

This repository contains the code for DABS, a benchmark for domain-agnostic self-supervised learning algorithms. The basic components of the benchmark can be found in datasets, encoders, and algorithms. Training is implemented with the PyTorch Lightning framework, logging with Weights and Biases, and configuration management with Hydra.

Usage

We provide support for Python >= 3.7. Install requirements with

python -m pip install -r requirements.txt

For instructions on how to install PyTorch versions compatible with your CUDA versions, see pytorch.org.

Datasets

We provide a set of dataset implementations (in src/datasets) from image, text, speech, sensor, medical imaging, and image-text domains. Preprocessing operations on these datasets are minimal and hard-coded as simple resizing (i.e. of images) and truncations (i.e. of text, audio). These should not be changed so as to maintain fair comparisons across other users of the benchmark.

See conf/datasets/*.yaml for all dataset configs, including the loss, metrics, and batch size used for each dataset.

Almost all datasets will download automatically when the dataset class is instantiated. The exceptions are the CheXpert, ImageNet, and CU Birds datasets, where manual registration or download is required. See the respective dataset files for specific instructions.

Pretraining Dataset (unlabeled) Transfer Dataset (labeled)
CIFAR10 Aircraft, CIFAR10, CU Birds, DTD, Traffic Sign, VGG Flower
PAMAP2 PAMAP2
MSCOCO MSCOCO (mismatched detection), VQA (Binary classification)
Wikitext-103 GLUE (10 Tasks)
mC4 PAWS-X (7 Tasks)
CheXpert CheXpert (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), ChestX-ray8 (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax)
LibriSpeech Audio MNIST, Fluent Speech (Action, Object, Location), Google Speech Commands, LibriSpeech, VoxCeleb1

Pretraining

During the pretraining phase, self-supervised encoders are trained to learn good representations from unlabeled data. We currently support seven datasets for pretraining, one for each domain: MS COCO, ImageNet, CheXpert, PAMAP2, mC4, WikiText-103, and LibriSpeech. If the pretraining dataset has associated labels, an online linear evaluator is jointly trained with the encoder to provide a heuristic of transfer performance.

Run pretraining with commands like

python pretrain.py exp.name=<experiment-name> dataset=<dataset> algorithm=<algorithm>

Each dataset and encoder has its own config file, so to train a Transformer on the CheXpert dataset with the e-Mix algorithm, run

python pretrain.py exp.name=emix-chexpert encoder=transformer dataset=chexpert algorithm=emix

See conf/pretrain.yaml for all pretraining configuration fields.

For more information on the datasets, encoders, and algorithms, see the following section.

Pretraining Dataset Modality Label type (unused) Input Type
CIFAR10 Natural images Single label 2d
PAMAP2 Sensor Single label 2d
MSCOCO Captioned images Single label 2d +
tokens
WikiText-103 English Text No label tokens
mC4 Multilingual Text No label tokens
CheXpert Medical images Multi label 2d
LibriSpeech Speech No label 2d

Transfer Learning

After pretraining, a small linear classifier is trained on top of the frozen encoder. Run transfer learning from a randomly initialized encoder with

python transfer.py exp.name=<experiment-name> dataset=<dataset> ckpt=null 

See conf/transfer.yaml for all transfer learning configuration fields and optionally replace null with the path to your pretrained encoder checkpoint.

Dataset Modality Label type Evaluation metric Input Type
Aircraft Natural images Single label Accuracy 2d
CU Birds Natural images Single label Accuracy 2d
DTD Natural images Single label Accuracy 2d
Traffic Sign Natural images Single label Accuracy 2d
VGG Flower Natural images Single label Accuracy 2d
Pamap2 Sensor Single label Accuracy 2d
MS COCO Captioned images Binary label Accuracy 2d +
tokens
VQA Captioned images Binary label Accuracy 2d +
tokens
CheXpert Medical images Multi label AUROC 2d
ChestX-ray8 Medical images Multi label AUROC 2d
PAWS-X Multilingual Text Binary label Accuracy tokens
COLA English Text Binary label Pearson correlation tokens
MNLI Matched English Text Single label Accuracy tokens
MNLI Mismatched English Text Single label Accuracy tokens
MRPC English Text Binary label Accuracy tokens
QNLI English Text Binary label Accuracy tokens
QQP English Text Binary label Accuracy tokens
RTE English Text Binary label Accuracy tokens
SST2 English Text Binary label Accuracy tokens
STSB English Text Regression Spearman correlation tokens
WNLI English Text Binary label Accuracy tokens
Audio MNIST Speech Single label Accuracy 2d
Fluent Speech Speech Single label Accuracy 2d
Google Speech Commands Speech Single label Accuracy 2d
LibriSpeech Speech Single label Accuracy 2d
VoxCeleb1 Speech Single label Accuracy 2d

Encoders

A domain-agnostic SSL method should have an encoder which remains as constant as possible across domains. We provide a general transformer encoder baseline (in src/encoders). The transformer operates on a sequence of vectors that are produced by a small set of embedding modules (e.g. patch or token embeddings).

Pretraining algorithms

The pretraining algorithm is the framework and objective that the encoder is trained with. Examples of domain-specific algorithms include SimCLR, BYOL, and MoCo, but these are not domain-agnostic methods as they depend on vision-specific augmentations. We provide our own domain-agnostic implementations of recent algorithms, including e-mix (a generalization of i-mix) and Shuffled Embedding Detection (ShED; a generalization of ELECTRA), which randomly permutes a subset of the input embeddings and trains the model to identify the permuted embeddings.

Results

Below are results for algorithms trained on each dataset in DABS. The baseline performance is obtained via a randomly initialized encoder.

Pretrain Dataset Transfer Dataset Encoder Baseline Performance e-mix Performance ShED Performance
ImageNet CIFAR10 Transformer 24.20% 39.43% 39.63%
ImageNet CU Birds Transformer 1.62% 3.86% 2.95%
ImageNet VGG Flowers Transformer 9.03% 25.96% 13.03%
ImageNet DTD Transformer 7.39% 8.83% 18.35%
ImageNet Traffic Sign Transformer 14.33% 65.07% 27.51%
ImageNet Aircraft Transformer 2.70% 10.15% 5.60%
PAMAP2 PAMAP2 Transformer 69.81% 79.48% 88.69%
MSCOCO VQA Transformer 57.50% 48.90% 54.30%
CheXpert CheXpert Transformer 68.14% 72.40% 72.40%
CheXpert ChestX-ray8 Transformer 57.00% 63.00% 63.70%
Wikitext-103 GLUE (average) Transformer 42.29% 44.08% 48.37%
mC4 PAWS-X (average) Transformer 58.11% 56.16% 59.91%
LibriSpeech Audio MNIST Transformer 33.13% 80.35% 67.33%
LibriSpeech Fluent Locations Transformer 62.09% 60.93% 60.24%
LibriSpeech Fluent Actions Transformer 26.15% 29.87% 30.53%
LibriSpeech Fluent Objects Transformer 30.13% 39.89% 39.36%
LibriSpeech Google Speech Commands Transformer 4.87% 19.22% 20.73%
LibriSpeech LibriSpeech Transformer 17.12% 60.18% 34.77%
LibriSpeech VoxCeleb1 Transformer 0.59% 2.43% 2.81%
Owner
Alex Tamkin
PhD at @stanfordnlp
Alex Tamkin
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

688 Jan 04, 2023
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 04, 2023
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

0 Jul 15, 2021
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Mohamed Elmorsy 2 Jul 06, 2022
Composing methods for ML training efficiency

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training.

MosaicML 2.8k Jan 08, 2023
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Fast Differentiable Matrix Sqrt Root

Fast Differentiable Matrix Sqrt Root Geometric Interpretation of Matrix Square Root and Inverse Square Root This repository constains the official Pyt

YueSong 42 Dec 30, 2022
Behavioral "black-box" testing for recommender systems

RecList RecList Free software: MIT license Documentation: https://reclist.readthedocs.io. Overview RecList is an open source library providing behavio

Jacopo Tagliabue 375 Dec 30, 2022
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [Official Paddle Implementation] [Huggingface Gradio Demo] [Unofficial

442 Dec 16, 2022
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
mPose3D, a mmWave-based 3D human pose estimation model.

mPose3D, a mmWave-based 3D human pose estimation model.

KylinChen 35 Nov 08, 2022
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Wenxuan Zhou 146 Nov 29, 2022
MAterial del programa Misión TIC 2022

Mision TIC 2022 Esta iniciativa, aparece como respuesta frente a los retos de la Cuarta Revolución Industrial, y tiene como objetivo la formación de 1

6 May 25, 2022
A toy compiler that can convert Python scripts to pickle bytecode 🥒

Pickora 🐰 A small compiler that can convert Python scripts to pickle bytecode. Requirements Python 3.8+ No third-party modules are required. Usage us

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 68 Jan 04, 2023
Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Shakespeare translations using TensorFlow This is an example of using the new Google's TensorFlow library on monolingual translation going from modern

Motoki Wu 245 Dec 28, 2022
Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Introduction Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models". In this work, we demonstrate that existi

Wei-Cheng Tseng 7 Nov 01, 2022