RVT: Robust Vision Transformers

This repository contains PyTorch code for Robust Vision Transformers.

For details see Rethinking the Design Principles of Robust Vision Transformer by Xiaofeng Mao, Gege Qi, Yuefeng Chen, Yuan He and Hui Xue.

Usage

First, clone the repository locally:

git clone https://github.com/vtddggg/Robust-Vision-Transformer.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

We use 4 nodes with 8 gpus to train RVT-Ti, RVT-S and RVT-B:

Training RVT-Ti

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_tiny --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-S

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_small --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-B

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_base --data-path /path/to/imagenet --output_dir output --batch-size 32 --dist-eval

If you want to train RVT-Ti*, RVT-S* or RVT-B*, simply add --use_mask and --use_patch_aug to enable positon-aware attention scaling and patch-wise augmentation.

This repository contains PyTorch code for Robust Vision Transformers.

Related tags

Overview

RVT: Robust Vision Transformers

Usage

Training RVT-Ti

Training RVT-S

Training RVT-B

Owner

Dungeons and Dragons randomized content generator

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

HGCAE Pytorch implementation. CVPR2021 accepted.

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Simple sinc interpolation in PyTorch.

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Gesture Volume Control v.2

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

UniFormer - official implementation of UniFormer

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Detectron2 for Document Layout Analysis

An Straight Dilated Network with Wavelet for image Deblurring

This repository contains the code for the paper Neural RGB-D Surface Reconstruction

Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

Apply AnimeGAN-v2 across frames of a video clip

YolactEdge: Real-time Instance Segmentation on the Edge

PyTorch implementation(s) of various ResNet models from Twitch streams.