Code for "Searching for Efficient Multi-Stage Vision Transformers"

Overview

Searching for Efficient Multi-Stage Vision Transformers

This repository contains the official Pytorch implementation of "Searching for Efficient Multi-Stage Vision Transformers" and is based on DeiT and timm.

photo not available

Illustration of the proposed multi-stage ViT-Res network.


photo not available

Illustration of weight-sharing neural architecture search with multi-architectural sampling.


photo not available

Accuracy-MACs trade-offs of the proposed ViT-ResNAS. Our networks achieves comparable results to previous work.

Content

  1. Requirements
  2. Data Preparation
  3. Pre-Trained Models
  4. Training ViT-Res
  5. Performing Neural Architecture Search
  6. Evaluation

Requirements

The codebase is tested with 8 V100 (16GB) GPUs.

To install requirements:

    pip install -r requirements.txt

Docker files are provided to set up the environment. Please run:

    cd docker

    sh 1_env_setup.sh
    
    sh 2_build_docker_image.sh
    
    sh 3_run_docker_image.sh

Make sure that the configuration specified in 3_run_docker_image.sh is correct before running the command.

Data Preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Pre-Trained Models

Pre-trained weights of super-networks and searched networks can be found here.

Training ViT-Res

To train ViT-Res-Tiny, modify IMAGENET_PATH in scripts/vit-sr-nas/reference_net/tiny.sh and run:

    sh scripts/vit-sr-nas/reference_net/tiny.sh 

We use 8 GPUs for training. Please modify numbers of GPUs (--nproc_per_node) and adjust batch size (--batch-size) if different numbers of GPUs are used.

Performing Neural Architecture Search

0. Building Sub-Train and Sub-Val Set

Modify _SOURCE_DIR, _SUB_TRAIN_DIR, and _SUB_VAL_DIR in search_utils/build_subset.py, and run:

    cd search_utils
    
    python build_subset.py
    
    cd ..

1. Super-Network Training

Before running each script, modify IMAGENET_PATH (directed to the directory containing the sub-train and sub-val sets).

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/super_net/tiny.sh

For ViT-ResNAS-Small and Medium, run:

    sh scripts/vit-sr-nas/super_net/small.sh

2. Evolutionary Search

Before running each script, modify IMAGENET_PATH (directed to the directory containing the sub-train and sub-val sets) and MODEL_PATH.

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/evolutionary_search/tiny.sh

For ViT-ResNAS-Small, run:

    sh scripts/vit-sr-nas/evolutionary_search/[email protected]

For ViT-ResNAS-Medium, run:

    sh scripts/vit-sr-nas/evolutionary_search/[email protected]

After running evolutionary search for each network, see summary.txt in output directory and modify network_def.

For example, the network_def in summary.txt is ((4, 220), (1, (220, 5, 32), (220, 880), 1), (1, (220, 5, 32), (220, 880), 1), (1, (220, 7, 32), (220, 800), 1), (1, (220, 7, 32), (220, 800), 0), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (3, 220, 440), (1, (440, 10, 48), (440, 1760), 1), (1, (440, 10, 48), (440, 1440), 1), (1, (440, 10, 48), (440, 1920), 1), (1, (440, 10, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1120), 0), (1, (440, 12, 48), (440, 1440), 1), (3, 440, 880), (1, (880, 16, 64), (880, 3200), 1), (1, (880, 12, 64), (880, 3200), 1), (1, (880, 16, 64), (880, 2880), 1), (1, (880, 12, 64), (880, 3200), 0), (1, (880, 12, 64), (880, 2240), 1), (1, (880, 12, 64), (880, 3520), 0), (1, (880, 14, 64), (880, 2560), 1), (2, 880, 1000)).

Remove the element in the tuple that has 1 in the first element and 0 in the last element (e.g. (1, (220, 5, 32), (220, 880), 0)).

This reflects that the transformer block is removed in a searched network.

After this modification, the network_def becomes ((4, 220), (1, (220, 5, 32), (220, 880), 1), (1, (220, 5, 32), (220, 880), 1), (1, (220, 7, 32), (220, 800), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (3, 220, 440), (1, (440, 10, 48), (440, 1760), 1), (1, (440, 10, 48), (440, 1440), 1), (1, (440, 10, 48), (440, 1920), 1), (1, (440, 10, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1440), 1), (3, 440, 880), (1, (880, 16, 64), (880, 3200), 1), (1, (880, 12, 64), (880, 3200), 1), (1, (880, 16, 64), (880, 2880), 1), (1, (880, 12, 64), (880, 2240), 1), (1, (880, 14, 64), (880, 2560), 1), (2, 880, 1000)).

Then, use the searched network_def for searched network training.

3. Searched Network Training

Before running each script, modify IMAGENET_PATH.

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/searched_net/tiny.sh

For ViT-ResNAS-Small, run:

    sh scripts/vit-sr-nas/searched_net/[email protected]

For ViT-ResNAS-Medium, run:

    sh scripts/vit-sr-nas/searched_net/[email protected]

4. Fine-tuning Trained Networks at Higher Resolution

Before running, modify IMAGENET_PATH and FINETUNE_PATH (directed to trained ViT-ResNAS-Medium checkpoint). Then, run:

    sh scripts/vit-sr-nas/finetune/[email protected]

To fine-tune at different resolutions, modify --model, --input-size and --mix-patch-len. We provide models at resolutions 280, 336, and 392 as shown in here. Note that --input-size must be equal to "56 * --mix-patch-len" since the spatial size in ViT-ResNAS is reduced by 56X.

Evaluation

Before running, modify IMAGENET_PATH and MODEL_PATH. Then, run:

    sh scripts/vit-sr-nas/eval/[email protected]

Questions

Please direct questions to Yi-Lun Liao ([email protected]).

License

This repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file.

Owner
Yi-Lun Liao
Yi-Lun Liao
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

Ali Abdalla 34 Jan 05, 2023
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
Pytorch implementation of "Forward Thinking: Building and Training Neural Networks One Layer at a Time"

forward-thinking-pytorch Pytorch implementation of Forward Thinking: Building and Training Neural Networks One Layer at a Time Requirements Python 2.7

Kim Heecheol 65 Oct 06, 2022
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

4 Jun 28, 2022
Performant, differentiable reinforcement learning

deluca Performant, differentiable reinforcement learning Notes This is pre-alpha software and is undergoing a number of core changes. Updates to follo

Google 114 Dec 27, 2022
A library for Deep Learning Implementations and utils

deeply A Deep Learning library Table of Contents Features Quick Start Usage License Features Python 2.7+ and Python 3.4+ compatible. Quick Start $ pip

Achilles Rasquinha 1 Dec 12, 2022
A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

Dying Light 2 PAKFile Utility A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers. This tool aims to make PAKFile (.pak files) modding a

RHQ Online 12 Aug 26, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

3 Nov 23, 2021
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

To run a generation experiment (either conceptnet or atomic), follow these instructions: First Steps First clone, the repo: git clone https://github.c

Antoine Bosselut 575 Jan 01, 2023
A 10000+ hours dataset for Chinese speech recognition

WenetSpeech Official website | Paper A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition Download Please visit the official website, rea

310 Jan 03, 2023
CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

CoMoGAN: Continuous Model-guided Image-to-Image Translation Official repository. Paper CoMoGAN: continuous model-guided image-to-image translation [ar

166 Dec 31, 2022
Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

Bib Parser Convenient script to parse .bib files with the ACM Digital Library li

Mehtab Iqbal (Shahan) 1 Jan 26, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Type4Py: Deep Similarity Learning-Based Type Inference for Python

Type4Py: Deep Similarity Learning-Based Type Inference for Python This repository contains the implementation of Type4Py and instructions for re-produ

Software Analytics Lab 45 Dec 15, 2022
Interactive dimensionality reduction for large datasets

BlosSOM 🌼 BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimen

19 Dec 14, 2022
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

PDVC Official implementation for End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) [paper] [valse论文速递(Chinese)] This repo supports:

Teng Wang 118 Dec 16, 2022
PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

LTH-ObjectRecognition The Lottery Ticket Hypothesis for Object Recognition Sharath Girish*, Shishira R Maiya*, Kamal Gupta, Hao Chen, Larry Davis, Abh

16 Feb 06, 2022