Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Overview

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Pytorch Implementation for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

If the project is useful to you, please give us a star. ⭐️

image

@article{gao2021disco,
  title={DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning},
  author={Gao, Yuting and Zhuang, Jia-Xin and Li, Ke and Cheng, Hao and Guo, Xiaowei and Huang, Feiyue and Ji, Rongrong and Sun, Xing},
  journal={arXiv preprint arXiv:2104.09124},
  year={2021}
}

Checkpoints

Teacher Models

Architecture Self-supervised Methods Model Checkpoints
ResNet152 MoCo-V2 Model
ResNet101 MoCo-V2 Model
ResNet50 MoCo-V2 Model

For teacher models such as ResNet-50*2 etc, we use their official implementation, which can be downloaded from their github pages.

Student Models by DisCo

Teacher/Students Efficient-B0 ResNet-18 Vit-Tiny XCiT-Tiny
ResNet-50 Model Model - -
ResNet-101 Model Model - -
ResNet-152 Model Model - -
ResNet-50*2 Model Model - -
ViT-Small - - Model -
XCiT-Small - - - Model

Requirements

  • Python3

  • Pytorch 1.6+

  • Detectron2

  • 8 GPUs are preferred

  • ImageNet, Cifar10/100, VOC, COCO

Run

Before running, we firstly move all data into share memory

cp /path/to/ImageNet /dev/shm

Pretrain Model

For pretraining baseline models with default hidden layer dimension in Tab1

# Switch to moco directory
cd moco

# R-50
python3 -u main_moco.py -a resnet50 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet50 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-101
python3 -u main_moco.py -a resnet101 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet101 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-152
python3 -u main_moco.py -a resnet152 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 800 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet152 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0799.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Mob
python3 -u main_moco.py -a mobilenetv3 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 512 /dev/shm 2>&1 |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a mobilenetv3 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B0
python3 -u main_moco.py -a efficientb0 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 2>&1  |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B1
python3 -u main_moco.py -a efficientb1 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0  --hidden 1280  /dev/shm  2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb1 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-18
python3 -u main_moco.py -a resnet18 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-34
python3 -u main_moco.py -a resnet34 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet34 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

DisCo

For training DisCo in Tab1, Comparision with baseline

# Switch to DisCo directory
cd DisCo

# R-50 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# R50w2 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50w2 --teacher /path/to/swav_RN50w2_400ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
#          Evaluation
python3 yt_main_lincls.py -a resnet18 --learning-rate 30.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar  /dev/shm 2>&1 | tee ./logs/std.log

For Tab2, Linear evaluation top-1 accuracy (%) on ImageNet compared with different distillation methods.

# RKD+DisCo, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar --use-mse /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# RKD, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab3 , **Object detection and instance segmentation results **

# Cp data to /dev/shm and set up path for Detectron2
cp -r /path/to/VOCdevkit/* /dev/shm/
cp -r /path/to/coco_2017 /dev/shm/coco
export DETECTRON2_DATASETS=/dev/shm

pip install /youtu-reid/jiaxzhuang/acmm/detectron2-0.4+cu101-cp36-cp36m-linux_x86_64.whl
cd detection

# Convert model for Detectron2
python3 convert-pretrain-to-detectron2.py /path/ckpt/checkpoint_0199.pth.tar ./output.pkl

# Evaluation on VOC
python3 train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml --num-gpus 8 --resume MODEL.RESNETS.DEPTH 34 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log
# Evaluation on CoCo
python3 train_net.py --config-file configs/coco_R_50_C4_2x_moco.yaml --num-gpus 8  --resume MODEL.RESNETS.DEPTH 18 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log

For Fig5 , evaluation on Semi-Supervised Tasks

# Copy 1%, 10% ImageNet from the complete ImageNet, according to split from SimCLR.
cd data
# Need to set up path to Compelete ImageNet and the output path.
python3 -u imagenet_1_fraction.py --ratio 1
python3 -u imagenet_1_fraction.py --ratio 10

# Evaluation on 1% ImageNet with Eff-B0 by DisCo
cp -r /path/to/imagenet_1_fraction/train  /dev/shm
cp -r /path/to/imagenet_1_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

# Evaluation on 10% ImageNet with R-18 by DisCo
cp -r /path/to/imagenet_10_fraction/train  /dev/shm
cp -r /path/to/imagenet_10_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

For Fig6, evaluation on Cifar10/Cifar100

# Copy Cifar10/100 to /dev/shm
cp /path/to/Cifar10/100 /dev/shm

# Evaluation on 1% Cifar10 with Eff-B0 by DisCo
python3 cifar_main_lincls.py -a efficientb0 --dataset cifar10 --lr 3 --epochs 200 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
# Evaluation on  Cifar100 with Resnet18 by DisCo
python3 cifar_main_lincls.py -a resnet18 --dataset cifar100 --lr 3 --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab4, Linear evaluation top-1 accuracy (%) on ImageNet, compared with SEED with consistent dimension in hidden layer.

python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab5, Linear evaluation top-1 accuracy (%) on ImageNet with SwAV as the testbed.

# SwAV, Train with SwAV only
cd swav-master
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --use_fp16 true 2>&1 | tee ./logs/std.log
# Evaluation
python3 -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --arch efficientb0 --data_path /dev/shm --pretrained /path/to/ckpt/checkpoints/ckp-199.pth 2>&1 | tee ./logs/std.log

# DisCo + SwAV
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav_distill.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --pretrained /path/to/swav_800ep_pretrain.pth.tar 2>&1 | tee ./logs/std.log

For Tab6, Linear evaluation top-1 accuracy (%) on ImageNet with variants of teacher pre-training methods.

# SwAV
python3 -u main.py -a resnet34 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch SWAVresnet50 --teacher /path/to/swav_800ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

Visualization

cd DisCo
# Generate Embed
# Move Embed to data path

python -u draw.py

Thanks

Code heavily depends on MoCo-V2, Detectron2.

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
Simulating Sycamore quantum circuits classically using tensor network algorithm.

Simulating the Sycamore quantum supremacy circuit This repo contains data we have obtained in simulating the Sycamore quantum supremacy circuits with

Feng Pan 46 Nov 17, 2022
Simple cross-platform application for DaVinci surgical video frame annotation

About DaVid is a simple cross-platform GUI for annotating robotic and endoscopic surgical actions for use in deep-learning research. Features Simple a

Cyril Zakka 4 Oct 09, 2021
City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Code

City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Requirements Python 3.8 or later with all requirements.txt dependencies installed,

88 Dec 12, 2022
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 25, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
OOD Generalization and Detection (ACL 2020)

Pretrained Transformers Improve Out-of-Distribution Robustness How does pretraining affect out-of-distribution robustness? We create an OOD benchmark

littleRound 57 Jan 09, 2023
MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022)

MetaShift: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts This repo provides the PyTorch source code of our paper: Me

88 Jan 04, 2023
LowRankModels.jl is a julia package for modeling and fitting generalized low rank models.

LowRankModels.jl LowRankModels.jl is a Julia package for modeling and fitting generalized low rank models (GLRMs). GLRMs model a data array by a low r

Madeleine Udell 183 Dec 17, 2022
Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

1 Feb 14, 2022
Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

Ian Pointer 368 Dec 17, 2022
Import Python modules from dicts and JSON formatted documents.

Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important

Wojciech Wentland 1 Sep 07, 2022
State-Relabeling Adversarial Active Learning

State-Relabeling Adversarial Active Learning Code for SRAAL [2020 CVPR Oral] Requirements torch = 1.6.0 numpy = 1.19.1 tqdm = 4.31.1 AL Results The

10 Jul 14, 2022
A deep neural networks for images using CNN algorithm.

Example-CNN-Project This is a simple project showing how to implement deep neural networks using CNN algorithm. The dataset is taken from this link: h

Mohammad Amin Dadgar 3 Sep 16, 2022
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Deploy optimized transformer based models on Nvidia Triton server

🤗 Hugging Face Transformer submillisecond inference 🤯 and deployment on Nvidia Triton server Yes, you can perfom inference with transformer based mo

Lefebvre Sarrut Services 1.2k Jan 05, 2023
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 08, 2022
Tooling for the Common Objects In 3D dataset.

CO3D: Common Objects In 3D This repository contains a set of tools for working with the Common Objects in 3D (CO3D) dataset. Download the dataset The

Facebook Research 724 Jan 06, 2023
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution 📜 Technical report 🗨️ Presentation 🎉 Announcement 🛆Motion Prediction Channel Website 🛆

158 Jan 08, 2023