Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Last update: Dec 02, 2022

Related tags

Deep Learning CARE

Overview

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

This repository is the official implementation of CARE.

Updates

(09/10/2021) Our paper is accepted by NeurIPS 2021.

Requirements

To install requirements:

conda create -n care python=3.6
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
pip install tensorboard
pip install ipdb
pip install einops
pip install loguru
pip install pyarrow==3.0.0
pip install tqdm

📋 Pytorch>=1.6 is needed for runing the code.

Data Preparation

Prepare the ImageNet data in {data_path}/train.lmdb and {data_path}/val.lmdb

Relpace the original data path in care/data/dataset_lmdb (Line7 and Line40) with your new {data_path}.

📋 Note that we use the lmdb file to speed-up the data-processing procedure.

Training

Before training the ResNet-50 (100 epoch) in the paper, run this command first to add your PYTHONPATH:

export PYTHONPATH=$PYTHONPATH:{your_code_path}/care/
export PYTHONPATH=$PYTHONPATH:{your_code_path}/care/care/

Then run the training code via:

bash run_train.sh      #(The training script is used for trianing CARE with 8 gpus)
bash single_gpu_train.sh    #(We also provide the script for trainig CARE with only one gpu)

📋 The training script is used to do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine

using -b to specify batch_size, e.g., -b 128

using -d to specify gpu_id for training, e.g., -d 0-7

using --log_path to specify the main folder for saving experimental results.

using --experiment-name to specify the folder for saving training outputs.

The code base also supports for training other backbones (e.g., ResNet101 and ResNet152) with different training schedules (e.g., 200, 400 and 800 epochs).

Evaluation

Before start the evaluation, run this command first to add your PYTHONPATH:

export PYTHONPATH=$PYTHONPATH:{your_code_path}/care/
export PYTHONPATH=$PYTHONPATH:{your_code_path}/care/care/

Then, to evaluate the pre-trained model (e.g., ResNet50-100epoch) on ImageNet, run:

bash run_val.sh      #(The training script is used for evaluating CARE with 8 gpus)
bash debug_val.sh    #(We also provide the script for evaluating CARE with only one gpu)

📋 The training script is used to do the supervised linear evaluation of a ResNet-50 model on ImageNet in an 8-gpu machine

using -b to specify batch_size, e.g., -b 128

using -d to specify gpu_id for training, e.g., -d 0-7

Modifying --log_path according to your own config.

Modifying --experiment-name according to your own config.

Pre-trained Models

We here provide some pre-trained models in the [shared folder]:

Here are some examples.

[ResNet-50 100epoch] trained on ImageNet using ResNet-50 with 100 epochs.
[ResNet-50 200epoch] trained on ImageNet using ResNet-50 with 200 epochs.
[ResNet-50 400epoch] trained on ImageNet using ResNet-50 with 400 epochs.

More models are provided in the following model zoo part.

📋 We will provide more pretrained models in the future.

Model Zoo

Our model achieves the following performance on :

Self-supervised learning on image classifications.

Method	Backbone	epoch	Top-1	Top-5	pretrained model	linear evaluation model
CARE	ResNet50	100	72.02%	90.02%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50	200	73.78%	91.50%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50	400	74.68%	91.97%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50	800	75.56%	92.32%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50(2x)	100	73.51%	91.66%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50(2x)	200	75.00%	92.22%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50(2x)	400	76.48%	92.99%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet50(2x)	800	77.04%	93.22%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet101	100	73.54%	91.63%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet101	200	75.89%	92.70%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet101	400	76.85%	93.31%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet101	800	77.23%	93.52%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet152	100	74.59%	92.09%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet152	200	76.58%	93.63%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet152	400	77.40%	93.63%	[pretrained] (wip)	[linear_model] (wip)
CARE	ResNet152	800	78.11%	93.81%	[pretrained] (wip)	[linear_model] (wip)

Transfer learning to object detection and semantic segmentation.

COCO det

Method	Backbone	epoch	AP_bb	AP_50	AP_75	pretrained model	det/seg model
CARE	ResNet50	200	39.4	59.2	42.6	[pretrained] (wip)	[model] (wip)
CARE	ResNet50	400	39.6	59.4	42.9	[pretrained] (wip)	[model] (wip)
CARE	ResNet50-FPN	200	39.5	60.2	43.1	[pretrained] (wip)	[model] (wip)
CARE	ResNet50-FPN	400	39.8	60.5	43.5	[pretrained] (wip)	[model] (wip)

COCO instance seg

Method	Backbone	epoch	AP_mk	AP_50	AP_75	pretrained model	det/seg model
CARE	ResNet50	200	34.6	56.1	36.8	[pretrained] (wip)	[model] (wip)
CARE	ResNet50	400	34.7	56.1	36.9	[pretrained] (wip)	[model] (wip)
CARE	ResNet50-FPN	200	35.9	57.2	38.5	[pretrained] (wip)	[model] (wip)
CARE	ResNet50-FPN	400	36.2	57.4	38.8	[pretrained] (wip)	[model] (wip)

VOC07+12 det

Method	Backbone	epoch	AP_bb	AP_50	AP_75	pretrained model	det/seg model
CARE	ResNet50	200	57.7	83.0	64.5	[pretrained] (wip)	[model] (wip)
CARE	ResNet50	400	57.9	83.0	64.7	[pretrained] (wip)	[model] (wip)

📋 More results are provided in the paper.

Contributing

📋 WIP

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Related tags

Overview

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Updates

Requirements

Data Preparation

Training

Evaluation

Pre-trained Models

Model Zoo

Self-supervised learning on image classifications.

Transfer learning to object detection and semantic segmentation.

COCO det

COCO instance seg

VOC07+12 det

Contributing

Owner

ChongjianGE

Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Deep Federated Learning for Autonomous Driving

A PyTorch implementation: "LASAFT-Net-v2: Listen, Attend and Separate by Attentively aggregating Frequency Transformation"

MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Manipulation OpenAI Gym environments to simulate robots at the STARS lab

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

Robust Lane Detection via Expanded Self Attention (WACV 2022)

Drslmarkov - Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

Loopy belief propagation for factor graphs on discrete variables, in JAX!

Efficient 6-DoF Grasp Generation in Cluttered Scenes

[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)