Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Overview

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

This repo contains the Pytorch implementation of our paper:

Revisiting Contrastive Methods for UnsupervisedLearning of Visual Representations

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis and Luc Van Gool.

Contents

  1. Introduction
  2. Key Results
  3. Installation
  4. Training
  5. Evaluation
  6. Model Zoo
  7. Citation

Introduction

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. We first study how biases in the dataset affect existing methods. Our results show that an approach like MoCo works surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains. We show that learning additional invariances - through the use of multi-scale cropping, stronger augmentations and nearest neighbors - improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers.

Key Results

  • Scene-centric Data: We do not observe any indications that contrastive pretraining suffers from using scene-centric image data. This is in contrast to prior belief. Moreover, if the downstream data is non-object-centric, pretraining on scene-centric datasets even outperforms ImageNet pretraining.
  • Dense Representations: The multi-scale cropping strategy allows the model to learn spatially structured representations. This questions a recent trend that proposed additional losses at a denser level in the image. The representations can be used for semantic segment retrieval and video instance segmentation without any finetuning.
  • Additional Invariances: We impose additional invariances by exploring different data augmentations and nearest neighbors to boost the performance.
  • Transfer Performance: We observed that if a model obtains improvements for the downstream classification tasks, the same improvements are not guarenteed for other tasks (e.g. semantic segmentation) and vice versa.

Installation

The Python code runs with recent Pytorch versions, e.g. 1.6. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.2 -c pytorch
conda install -c conda-forge opencv           # For evaluation
conda install matplotlib scipy scikit-learn   # For evaluation

We refer to the environment.yml file for an overview of the packages we used to reproduce our results. The code was run on 2 Tesla V100 GPUs.

Training

Now, we will pretrain on the COCO dataset. You can download the dataset from the official website. Several scripts in the scripts/ directory are provided. It contains the vanilla MoCo setup and our additional modifications for both 200 epochs and 800 epochs of training. First, modify --output_dir and the dataset location in each script before executing them. Then, run the following command to start the training for 200 epochs:

sh scripts/ours_coco_200ep.sh # Train our model for 200 epochs.

The training currently supports:

  • MoCo
  • + Multi-scale constrained cropping
  • + AutoAugment
  • + kNN-loss

A detailed version of the pseudocode can be found in Appendix B.

Evaluation

We perform the evaluation for the following downstream tasks: linear classification (VOC), semantic segmentation (VOC and Cityscapes), semantic segment retrieval and video instance segmentation (DAVIS). More details and results can be found in the main paper and the appendix.

Linear Classifier

The representations can be evaluated under the linear evaluation protocol on PASCAL VOC. Please visit the ./evaluation/voc_svm directory for more information.

Semantic Segmentation

We provide code to evaluate the representations for the semantic segmentation task on the PASCAL VOC and Cityscapes datasets. Please visit the ./evaluation/segmentation directory for more information.

Segment Retrieval

In order to obtain the results from the paper, run the publicly available code with our weights as the initialization of the model. You only need to adapt the amount of clusters, e.g. 5.

Video Instance Segmentation

In order to obtain the results from the paper, run the publicly available code from Jabri et al. with our weights as the initialization of the model.

Model Zoo

Several pretrained models can be downloaded here. For a fair comparison, which takes the training duration into account, we refer to Figure 5 in the paper. More results can be found in Table 4 and Table 9.

Method Epochs VOC SVM VOC mIoU Cityscapes mIoU DAVIS J&F Download link
MoCo 200 76.1 66.2 70.3 - Model 🔗
Ours 200 85.1 71.9 72.2 - Model 🔗
MoCo 800 81.0 71.1 71.3 63.2 Model 🔗
Ours 800 85.9 73.5 72.3 66.2 Model 🔗

Citation

This code is based on the MoCo repository. If you find this repository useful for your research, please consider citing the following paper(s):

@article{vangansbeke2021revisiting,
  title={Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={arxiv preprint arxiv:2106.05967},
  year={2021}
}
@inproceedings{he2019moco,
  title={Momentum Contrast for Unsupervised Visual Representation Learning},
  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

For any enquiries, please contact the main authors.

Extra

  • For an overview on self-supervised learning (SSL), have a look at the overview repository.
  • Interested in self-supervised semantic segmentation? Check out our recent work: MaskContrast.
  • Interested in self-supervised classification? Check out SCAN.
  • Other great SSL repositories: MoCo, SupContrast, SeLa, SwAV and many more here.

License

This software is released under a creative commons license which allows for personal and research use only. You can view a license summary here. Part of the code was based on MoCo. Check it out for more details.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

Owner
Wouter Van Gansbeke
PhD researcher at KU Leuven. Especially interested in computer vision, machine learning and deep learning. Working on self-supervised and multi-task learning.
Wouter Van Gansbeke
Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

3 Sep 30, 2021
In this project I played with mlflow, streamlit and fastapi to create a training and prediction app on digits

Fastapi + MLflow + streamlit Setup env. I hope I covered all. pip install -r requirements.txt Start app Go in the root dir and run these Streamlit str

76 Nov 23, 2022
Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations The code of: Weakly Supervised Learning of Instance Segmentation with I

Jiwoon Ahn 472 Dec 29, 2022
MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral) Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia [Paper] News This

254 Dec 29, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

bottom-up-attention This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and at

Peter Anderson 1.3k Jan 09, 2023
Norm-based Analysis of Transformer

Norm-based Analysis of Transformer Implementations for 2 papers introducing to analyze Transformers using vector norms: Kobayashi+'20 Attention is Not

Goro Kobayashi 52 Dec 05, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
OCRA (Object-Centric Recurrent Attention) source code

OCRA (Object-Centric Recurrent Attention) source code Hossein Adeli and Seoyoung Ahn Please cite this article if you find this repository useful: For

Hossein Adeli 2 Jun 18, 2022
A command line simple note taking app

Why yet another note taking program? note was designed with a very specific target in mind: me, and my 2354 scraps of paper. It runs from the command

64 Nov 20, 2022
Air Pollution Prediction System using Linear Regression and ANN

AirPollution Pollution Weather Prediction System: Smart Outdoor Pollution Monitoring and Prediction for Healthy Breathing and Living Publication Link:

Dr Sharnil Pandya, Associate Professor, Symbiosis International University 19 Feb 07, 2022
PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Anomaly Transformer in PyTorch This is an implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. This pape

spencerbraun 160 Dec 19, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 04, 2022
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend This project acts as both a tuto

Guillaume Chevalier 103 Jul 22, 2022
This repository contains the code for Direct Molecular Conformation Generation (DMCG).

Direct Molecular Conformation Generation This repository contains the code for Direct Molecular Conformation Generation (DMCG). Dataset Download rdkit

25 Dec 20, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 112 Jan 02, 2023
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices This repository contains the official PyTorch implemen

Yandex Research 21 Oct 18, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022