Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Last update: Jan 01, 2023

Related tags

Deep Learning PS-ViT

Overview

Vision Transformer with Progressive Sampling

This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Installation Instructions

Clone this repo:

git clone [email protected]:yuexy/PS-ViT.git
cd PS-ViT

Create a conda virtual environment and activate it:

conda create -n ps_vit python=3.7 -y
conda activate ps_vit

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.4, einops, pyyaml:

pip3 install timm=0.3.4, einops, pyyaml

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install PS-ViT:

python setup.py build_ext --inplace

Results and Models

All models listed below are evaluated with input size 224x224

Model	Top1 Acc	#params	FLOPS	Download
PS-ViT-Ti/14	75.6	4.8M	1.6G	Coming Soon
PS-ViT-B/10	80.6	21.3M	3.1G	Coming Soon
PS-ViT-B/14	81.7	21.3M	5.4G	Google Drive
PS-ViT-B/18	82.3	21.3M	8.8G	Google Drive

Evaluation

To evaluate a pre-trained PS-ViT on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a PS-ViT on ImageNet from scratch, run:

bash ./scripts/train_distributed.sh <job-name> <config-path> <num-gpus>

Citing PS-ViT

@article{psvit,
  title={Vision Transformer with Progressive Sampling},
  author={Yue, Xiaoyu and Sun, Shuyang and Kuang, Zhanghui and Wei, Meng and Torr, Philip and Zhang, Wayne and Lin, Dahua},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Xiaoyu Yue. You can easily reach him by sending an email to [email protected].

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Related tags

Overview

Vision Transformer with Progressive Sampling

Installation Instructions

Results and Models

Evaluation

Training from scratch

Citing PS-ViT

Contact

Owner

yuexy

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Cortex-compatible model server for Python and TensorFlow

Research code of ICCV 2021 paper "Mesh Graphormer"

Implementation of the GBST block from the Charformer paper, in Pytorch

Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Wenet STT Python

SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

A Python package for causal inference using Synthetic Controls

SpanNER: Named EntityRe-/Recognition as Span Prediction

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

an implementation of softmax splatting for differentiable forward warping using PyTorch

Blind Video Temporal Consistency via Deep Video Prior

MANO hand model porting for the GraspIt simulator

Lyapunov-guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

Robust and Accurate Object Detection via Self-Knowledge Distillation