Code for the Convolutional Vision Transformer (ConViT)

Last update: Jan 06, 2023

Related tags

Overview

ConViT : Vision Transformers with Convolutional Inductive Biases

This repository contains PyTorch code for ConViT. It builds on code from the Data-Efficient Vision Transformer and from timm.

For details see the ConViT paper by Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli and Levent Sagun.

If you use this code for a paper please cite:

@article{d2021convit,
  title={ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases},
  author={d'Ascoli, St{\'e}phane and Touvron, Hugo and Leavitt, Matthew and Morcos, Ari and Biroli, Giulio and Sagun, Levent},
  journal={arXiv preprint arXiv:2103.10697},
  year={2021}
}

Usage

Install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate ConViT-Ti on ImageNet test set, run:

python main.py --eval --model convit_tiny --pretrained --data-path /path/to/imagenet

This should give

[email protected] 73.116 [email protected] 91.710 loss 1.172

Training

To train ConViT-Ti on ImageNet on a single node with 4 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model convit_tiny --batch-size 256 --data-path /path/to/imagenet

To train the same model on a subsampled version of ImageNet where we only use 10% of the images of each class, add --sampling_ratio 0.1

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

To train ConViT-base on ImageNet on 2 nodes with 8 gpus each for 300 epochs:

python run_with_submitit.py --model convit_base --data-path /path/to/imagenet

License

The majority of this repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file, however portions of the project are available under separate license terms: deit and timm are licensed under Apache 2.0.

Code for the Convolutional Vision Transformer (ConViT)

Related tags

Overview

ConViT : Vision Transformers with Convolutional Inductive Biases

Usage

Data preparation

Evaluation

Training

Multinode training

License

Owner

Facebook Research

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

Generative Adversarial Text-to-Image Synthesis

The code for paper "Learning Implicit Fields for Generative Shape Modeling".

Implementation of E(n)-Transformer, which extends the ideas of Welling's E(n)-Equivariant Graph Neural Network to attention

Multivariate Time Series Transformer, public version

Hi Guys, here I am providing examples, which will help you in Lerarning Python

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

A curated list of awesome Deep Learning tutorials, projects and communities.

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

pyspark🍒🥭 is delicious，just eat it!😋😋

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

Proto-RL: Reinforcement Learning with Prototypical Representations

Deep learning model, heat map, data prepo

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

PyTorch implementation of spectral graph ConvNets, NIPS’16

A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Dataset for Direct Quotation Extraction and Attribution in News Articles.