PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

Related tags

Deep Learningmoco-v3
Overview

MoCo v3 for Self-supervised ResNet and ViT

Introduction

This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

The original MoCo v3 was implemented in Tensorflow and run in TPUs. This repo re-implements in PyTorch and GPUs. Despite the library and numerical differences, this repo reproduces the results and observations in the paper.

Main Results

The following results are based on ImageNet-1k self-supervised pre-training, followed by ImageNet-1k supervised training for linear evaluation or end-to-end fine-tuning. All results in these tables are based on a batch size of 4096.

ResNet-50, linear classification

pretrain
epochs
pretrain
crops
linear
acc
100 2x224 68.9
300 2x224 72.8
1000 2x224 74.6

ViT, linear classification

model pretrain
epochs
pretrain
crops
linear
acc
ViT-Small 300 2x224 73.2
ViT-Base 300 2x224 76.7

ViT, end-to-end fine-tuning

model pretrain
epochs
pretrain
crops
e2e
acc
ViT-Small 300 2x224 81.4
ViT-Base 300 2x224 83.2

The end-to-end fine-tuning results are obtained using the DeiT repo, using all the default DeiT configs. ViT-B is fine-tuned for 150 epochs (vs DeiT-B's 300ep, which has 81.8% accuracy).

Usage: Preparation

Install PyTorch and download the ImageNet dataset following the official PyTorch ImageNet training code. Similar to MoCo v1/2, this repo contains minimal modifications on the official PyTorch ImageNet code. We assume the user can successfully run the official PyTorch ImageNet code. For ViT models, install timm (timm==0.4.9).

The code has been tested with CUDA 10.2/CuDNN 7.6.5, PyTorch 1.9.0 and timm 0.4.9.

Usage: Self-supervised Pre-Training

Below are three examples for MoCo v3 pre-training.

ResNet-50 with 2-node (16-GPU) training, batch 4096

On the first node, run:

python main_moco.py \
  --moco-m-cos --crop-min=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 2 --rank 0 \
  [your imagenet-folder with train and val folders]

On the second node, run the same command with --rank 1. With a batch size of 4096, the training can fit into 2 nodes with a total of 16 Volta 32G GPUs.

ViT-Small with 1-node (8-GPU) training, batch 1024

python main_moco.py \
  -a vit_small -b 1024 \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=300 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

ViT-Base with 8-node training, batch 4096

With a batch size of 4096, ViT-Base is trained with 8 nodes:

python main_moco.py \
  -a vit_base \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=300 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 8 --rank 0 \
  [your imagenet-folder with train and val folders]

On other nodes, run the same command with --rank 1, ..., --rank 7 respectively.

Notes:

  1. The batch size specified by -b is the total batch size across all GPUs.
  2. The learning rate specified by --lr is the base lr, and is adjusted by the linear lr scaling rule in this line.
  3. Using a smaller batch size has a more stable result (see paper), but has lower speed. Using a large batch size is critical for good speed in TPUs (as we did in the paper).
  4. In this repo, only multi-gpu, DistributedDataParallel training is supported; single-gpu or DataParallel training is not supported. This code is improved to better suit the multi-node setting, and by default uses automatic mixed-precision for pre-training.

Usage: Linear Classification

By default, we use momentum-SGD and a batch size of 1024 for linear classification on frozen features/weights. This can be done with a single 8-GPU node.

python main_lincls.py \
  -a [architecture] --lr [learning rate] \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
  [your imagenet-folder with train and val folders]

Usage: End-to-End Fine-tuning ViT

To perform end-to-end fine-tuning for ViT, use our script to convert the pre-trained ViT checkpoint to DEiT format:

python convert_to_deit.py \
  --input [your checkpoint path]/[your checkpoint file].pth.tar \
  --output [target checkpoint file].pth

Then run the training (in the DeiT repo) with the converted checkpoint:

python $DEIT_DIR/main.py \
  --resume [target checkpoint file].pth \
  --epochs 150

This gives us 83.2% accuracy for ViT-Base with 150-epoch fine-tuning.

Note:

  1. We use --resume rather than --finetune in the DeiT repo, as its --finetune option trains under eval mode. When loading the pre-trained model, revise model_without_ddp.load_state_dict(checkpoint['model']) with strict=False.
  2. Our ViT-Small is with heads=12 in the Transformer block, while by default in DeiT it is heads=6. Please modify the DeiT code accordingly when fine-tuning our ViT-Small model.

Model Configs

See the commands listed in CONFIG.md.

Transfer Learning

See the instruction in the transfer dir.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

@Article{chen2021mocov3,
  author  = {Xinlei Chen* and Saining Xie* and Kaiming He},
  title   = {An Empirical Study of Training Self-Supervised Vision Transformers},
  journal = {arXiv preprint arXiv:2104.02057},
  year    = {2021},
}
Owner
Facebook Research
Facebook Research
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
N-Person-Check-Checker-Splitter - A calculator app use to divide checks

N-Person-Check-Checker-Splitter This is my from-scratch programmed calculator ap

2 Feb 15, 2022
Unofficial PyTorch implementation of TokenLearner by Google AI

tokenlearner-pytorch Unofficial PyTorch implementation of TokenLearner by Ryoo et al. from Google AI (abs, pdf) Installation You can install TokenLear

Rishabh Anand 46 Dec 20, 2022
Real-time Object Detection for Streaming Perception, CVPR 2022

StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection

Jinrong Yang 237 Dec 27, 2022
A general-purpose encoder-decoder framework for Tensorflow

READ THE DOCUMENTATION CONTRIBUTING A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summariz

Google 5.5k Jan 07, 2023
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Object classification with basic computer vision techniques

naive-image-classification Object classification with basic computer vision techniques. Final assignment for the computer vision course I took at univ

2 Jul 01, 2022
Implement of homography net by pytorch

HomographyNet Implement of homography net by pytorch Brief Introduction This project is based on the work Homography-Net: @article{detone2016deep, t

ronghao_CN 4 May 19, 2022
Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

Why Spectral Normalization Stabilizes GANs: Analysis and Improvements [paper (NeurIPS 2021)] [paper (arXiv)] [code] Authors: Zinan Lin, Vyas Sekar, Gi

Zinan Lin 32 Dec 16, 2022
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022
The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Quantum Qubit Rotation Algorithm Single qubit rotation gates $$ U(\Theta)=\bigotimes_{i=1}^n R_x (\phi_i) $$ QQRA for the max-cut problem This code wa

SheffieldWang 0 Oct 18, 2021
Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Causal Influence Detection for Improving Efficiency in Reinforcement Learning This repository contains the code release for the paper "Causal Influenc

Autonomous Learning Group 21 Nov 29, 2022
You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks.

AllSet This is the repo for our paper: You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks. We prepared all codes and a subse

Jianhao 51 Dec 24, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations Requirements The code is implemented in Python and requires

1 Nov 03, 2021
Code for intrusion detection system (IDS) development using CNN models and transfer learning

Intrusion-Detection-System-Using-CNN-and-Transfer-Learning This is the code for the paper entitled "A Transfer Learning and Optimized CNN Based Intrus

Western OC2 Lab 38 Dec 12, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022