Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Last update: Jan 01, 2023

Related tags

Deep Learning PS-ViT

Overview

Vision Transformer with Progressive Sampling

This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Installation Instructions

Clone this repo:

git clone [email protected]:yuexy/PS-ViT.git
cd PS-ViT

Create a conda virtual environment and activate it:

conda create -n ps_vit python=3.7 -y
conda activate ps_vit

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.4, einops, pyyaml:

pip3 install timm=0.3.4, einops, pyyaml

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install PS-ViT:

python setup.py build_ext --inplace

Results and Models

All models listed below are evaluated with input size 224x224

Model	Top1 Acc	#params	FLOPS	Download
PS-ViT-Ti/14	75.6	4.8M	1.6G	Coming Soon
PS-ViT-B/10	80.6	21.3M	3.1G	Coming Soon
PS-ViT-B/14	81.7	21.3M	5.4G	Google Drive
PS-ViT-B/18	82.3	21.3M	8.8G	Google Drive

Evaluation

To evaluate a pre-trained PS-ViT on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a PS-ViT on ImageNet from scratch, run:

bash ./scripts/train_distributed.sh <job-name> <config-path> <num-gpus>

Citing PS-ViT

@article{psvit,
  title={Vision Transformer with Progressive Sampling},
  author={Yue, Xiaoyu and Sun, Shuyang and Kuang, Zhanghui and Wei, Meng and Torr, Philip and Zhang, Wayne and Lin, Dahua},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Xiaoyu Yue. You can easily reach him by sending an email to [email protected].

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Related tags

Overview

Vision Transformer with Progressive Sampling

Installation Instructions

Results and Models

Evaluation

Training from scratch

Citing PS-ViT

Contact

Owner

yuexy

A library for researching neural networks compression and acceleration methods.

Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

A scikit-learn compatible neural network library that wraps PyTorch

DeepGNN is a framework for training machine learning models on large scale graph data.

Code for Motion Representations for Articulated Animation paper

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Explaining Hyperparameter Optimization via PDPs

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Pairwise model for commonlit competition

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Graph WaveNet apdapted for brain connectivity analysis.

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Sign Language Transformers (CVPR'20)