Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Last update: Jan 05, 2023

Related tags

Overview

SWAG: Supervised Weakly from hashtAGs

This repository contains SWAG models from the paper Revisiting Weakly Supervised Pre-Training of Visual Perception Models.

Requirements

This code has been tested to work with Python 3.8, PyTorch 1.10.1 and torchvision 0.11.2.

Note that CUDA support is not required for the tutorials.

To setup PyTorch and torchvision, please follow PyTorch's getting started instructions. If you are using conda on a linux machine, you can follow the following setup instructions -

conda create --name swag python=3.8
conda activate swag
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Model Zoo

We share checkpoints for all the pretrained models in the paper, and their ImageNet-1k finetuned counterparts. The models are available via torch.hub, and we also share URLs to all the checkpoints.

The details of the models, their torch.hub names / checkpoint links, and their performance on Imagenet-1k (IN-1K) are listed below.

Model	Pretrain Resolution	Pretrained Model	Finetune Resolution	IN-1K Finetuned Model	IN-1K Top-1	IN-1K Top-5
RegNetY 16GF	224 x 224	regnety_16gf	384 x 384	regnety_16gf_in1k	86.02%	98.05%
RegNetY 32GF	224 x 224	regnety_32gf	384 x 384	regnety_32gf_in1k	86.83%	98.36%
RegNetY 128GF	224 x 224	regnety_128gf	384 x 384	regnety_128gf_in1k	88.23%	98.69%
ViT B/16	224 x 224	vit_b16	384 x 384	vit_b16_in1k	85.29%	97.65%
ViT L/16	224 x 224	vit_l16	512 x 512	vit_l16_in1k	88.07%	98.51%
ViT H/14	224 x 224	vit_h14	518 x 518	vit_h14_in1k	88.55%	98.69%

The models can be loaded via torch hub using the following command -

model = torch.hub.load("facebookresearch/swag", model="vit_b16_in1k")

Inference Tutorial

For a tutorial with step-by-step instructions to perform inference, follow our inference tutorial and run it locally, or .

Live Demo

SWAG has been integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo on .

Credits: AK391

ImageNet 1K Evaluation

We also provide a script to evaluate the accuracy of our models on ImageNet 1K, imagenet_1k_eval.py. This script is a slightly modified version of the PyTorch ImageNet example which supports our models.

To evaluate the RegNetY 16GF IN1K model on a single node (one or more GPUs), one can simply run the following command -

python imagenet_1k_eval.py -m regnety_16gf_in1k -r 384 -b 400 /path/to/imagenet_1k/root/

Note that we specify a 384 x 384 resolution since that was the model's training resolution, and also specify a mini-batch size of 400, which is distributed over all the GPUs in the node. For larger models or with fewer GPUs, the batch size will need to be reduced. See the PyTorch ImageNet example README for more details.

Citation

If you use the SWAG models or if the work is useful in your research, please give us a star and cite:

@misc{singh2022revisiting,
      title={Revisiting Weakly Supervised Pre-Training of Visual Perception Models}, 
      author={Singh, Mannat and Gustafson, Laura and Adcock, Aaron and Reis, Vinicius de Freitas and Gedik, Bugra and Kosaraju, Raj Prateek and Mahajan, Dhruv and Girshick, Ross and Doll{\'a}r, Piotr and van der Maaten, Laurens},
      journal={arXiv preprint arXiv:2201.08371},
      year={2022}
}

License

SWAG models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Related tags

Overview

SWAG: Supervised Weakly from hashtAGs

Requirements

Model Zoo

Inference Tutorial

Live Demo

ImageNet 1K Evaluation

Citation

License

Owner

Meta Research

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

yolov5 deepsort 行人车辆跟踪检测计数

A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

Parsing, analyzing, and comparing source code across many languages

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Code for Robust Contrastive Learning against Noisy Views

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

A Python implementation of global optimization with gaussian processes.

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Calling Julia from Python - an experiment on data loading

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Transformer model implemented with Pytorch

ECLARE: Extreme Classification with Label Graph Correlations

Yolo algorithm for detection + centroid tracker to track vehicles

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Related tags

Overview

SWAG: Supervised Weakly from hashtAGs

Requirements

Model Zoo

Inference Tutorial

Live Demo

ImageNet 1K Evaluation

Citation

License

Owner

Meta Research

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

yolov5 deepsort 行人 车辆 跟踪 检测 计数

A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

Parsing, analyzing, and comparing source code across many languages

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Code for Robust Contrastive Learning against Noisy Views

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

A Python implementation of global optimization with gaussian processes.

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Calling Julia from Python - an experiment on data loading

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Transformer model implemented with Pytorch

ECLARE: Extreme Classification with Label Graph Correlations

Yolo algorithm for detection + centroid tracker to track vehicles

yolov5 deepsort 行人车辆跟踪检测计数