Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Last update: Dec 16, 2022

Related tags

Deep Learning ssl-invariances

Overview

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

This repository contains the official code for the paper Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks.

Requirements

This codebase has been tested with the following package versions:

python=3.8.8
torch=1.9.0+cu102
torchvision=0.10.0+cu102
PIL=8.1.0
numpy=1.19.2
scipy=1.6.1
tqdm=4.57.0
sklearn=0.24.1
albumentations=1.0.3

Prepare data

There are several classes defined in the datasets directory. The data is expected in a directory name data, located on the same level as this repository. Below is an outline of the expected file structure:

data/
    imagenet/
    CIFAR10/
    300W/
    ...
ssl-invariances/
    datasets/
    models/
    readme.md
    ...

For synthetic invariance evaluation, get the ILSVRC2012 validation data from https://image-net.org/ and store in ../data/imagenet/val/.

For real-world invariances, download the following datasets: Flickr1024, COIL-100, ALOI, ALOT, DaLI, ExposureErrors, RealBlur.

For extrinsic invariances, get Causal3DIdent.

Finally, our downstream datasets are CIFAR10, Caltech101, Flowers, 300W, CelebA, LSPose.

Pre-training models

We pre-train several models based on the MoCo codebase.

To set up a version of the codebase that can pre-train our models, first clone the MoCo repo onto the same level as this repo:

git clone https://github.com/facebookresearch/moco

This should be the resulting file structure:

data/
ssl-invariances/
moco/

Then copy the files from ssl-invariances/pretraining/ into the cloned repo:

cp ssl-invariances/pretraining/* moco/

Finally, to run our models, enter the cloned repo by cd moco and run one of the following:

# train the Default model
python main_moco.py -a resnet50 --model default --lr 0.03 --batch-size 256 --mlp --moco-t 0.2 --cos --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 ../data/imagenet

# train the Ventral model
python main_moco.py -a resnet50 --model ventral --lr 0.03 --batch-size 256 --mlp --moco-t 0.2 --cos --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 ../data/imagenet

# train the Dorsal model
python main_moco.py -a resnet50 --model dorsal --lr 0.03 --batch-size 256 --mlp --moco-t 0.2 --cos --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 ../data/imagenet

# train the Default(x3) model
python main_moco.py -a resnet50w3 --model default --moco-dim 384 --lr 0.03 --batch-size 256 --mlp --moco-t 0.2 --cos --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 ../data/imagenet

This will train the models for 200 epochs and save checkpoints. When training has completed, the final model checkpoint, e.g. default_00199.pth.tar, should be moved to ssl-invariances/models/default.pth.tarfor use in evaluation in the below code.

The rest of this codebase assumes these final model checkpoints are located in a directory called ssl-invariances/models/ as shown below.

ssl-invariances/
    models/
        default.pth.tar
        default_w3.pth.tar
        dorsal.pth.tar
        ventral.pth.tar

Synthetic invariance

To evaluate the Default model on grayscale invariance, run:

python eval_synthetic_invariance.py --model default --transform grayscale ../data/imagenet

This will compute the mean and covariance of the model's feature space and save these statistics in the results/ directory. These are then used to speed up future invariance computations for the same model.

Real-world invariance

To evaluate the Ventral model on COIL100 viewpoint invariance, run:

python eval_realworld_invariance.py --model ventral --dataset COIL100

Extrinsic invariance on Causal3DIdent

To evaluate the Dorsal model on Causal3DIdent object x position prediction, run:

python eval_causal3dident.py --model dorsal --target 0

Downstream performance

To evaluate the combined Def+Ven+Dor model on 300W facial landmark regression, run:

python eval_downstream.py --model default+ventral+dorsal --dataset 300w

Citation

If you find our work useful for your research, please consider citing our paper:

@misc{ericsson2021selfsupervised,
      title={Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks}, 
      author={Linus Ericsson and Henry Gouk and Timothy M. Hospedales},
      year={2021},
      eprint={2111.11398},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

If you have any questions, feel welcome to create an issue or contact Linus Ericsson ([email protected]).

Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Related tags

Overview

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Requirements

Prepare data

Pre-training models

Synthetic invariance

Real-world invariance

Extrinsic invariance on Causal3DIdent

Downstream performance

Citation

Owner

Linus Ericsson

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

NAS-Bench-x11 and the Power of Learning Curves

A high performance implementation of HDBSCAN clustering.

TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

A curated list of programmatic weak supervision papers and resources

Implementation of average- and worst-case robust flatness measures for adversarial training.

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

kullanışlı ve işinizi kolaylaştıracak bir araç

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

Python Algorithm Interview Book Review

Audio Visual Emotion Recognition using TDA

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

CATE: Computation-aware Neural Architecture Encoding with Transformers

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Detecting Blurred Ground-based Sky/Cloud Images