Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Last update: Jan 05, 2023

Related tags

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

The official implementation of Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

TL;DR Arch-Net is a family of neural networks made up of simple and efficient operators. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. For the classification task, only 30k unlabeled images randomly sampled from ImageNet dataset is needed.

Main Results

ImageNet Classification

Model	Bit Width	Top1	Top5
Arch-Net_Resnet18	32w32a	69.76	89.08
Arch-Net_Resnet18	2w4a	68.77	88.66
Arch-Net_Resnet34	32w32a	73.30	91.42
Arch-Net_Resnet34	2w4a	72.40	91.01
Arch-Net_Resnet50	32w32a	76.13	92.86
Arch-Net_Resnet50	2w4a	74.56	92.39
Arch-Net_MobilenetV1	32w32a	68.79	88.68
Arch-Net_MobilenetV1	2w4a	67.29	88.07
Arch-Net_MobilenetV2	32w32a	71.88	90.29
Arch-Net_MobilenetV2	2w4a	69.09	89.13

Multi30k Machine Translation

Model	translation direction	Bit Width	BLEU
Transformer	English to Gemany	32w32a	32.44
Transformer	English to Gemany	2w4a	33.75
Transformer	English to Gemany	4w4a	34.35
Transformer	English to Gemany	8w8a	36.44
Transformer	Gemany to English	32w32a	30.32
Transformer	Gemany to English	2w4a	32.50
Transformer	Gemany to English	4w4a	34.34
Transformer	Gemany to English	8w8a	34.05

Dependencies

python == 3.6

refer to requirements.txt for more details

Data Preparation

Download ImageNet and multi30k data(google drive or BaiduYun, code: 8brd) and put them in ./arch-net/data/ as follow:

./data/
├── imagenet
│   ├── train
│   ├── val
├── multi30k

Download teacher models at google drive or BaiduYun(code: 57ew) and put them in ./arch-net/models/teacher/pretrained_models/

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

train and evaluate

cd ./train_imagenet

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn

evaluate if you already have the trained models

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn --evaluate

Machine Translation

train a arch-net_transformer of 2w4a

cd ./train_transformer

python3 train_archnet_transformer.py --translate_direction en2de --teacher_model_path ../models/teacher/pretrained_models/transformer_en_de.chkpt --data_pkl ../data/multi30k/m30k_ende_shr.pkl --batch_size 48 --final_epochs 50 --weight_bit 2 --feature_bit 4 --lr 1e-3 --weight_decay 1e-6 --label_smoothing

for arch-net_transformer of 8w8a, use the lr of 1e-3 and the weight decay of 1e-4

evaluate

cd ./evaluate

python3 translate.py --data_pkl ./data/multi30k/m30k_ende_shr.pkl --model path_to_the_outptu_directory/model_max_acc.chkpt

to get the BLEU of the evaluated results, go to this website, and then upload 'predictions.txt' in the output directory and the 'gt_en.txt' or 'gt_de.txt' in ./arch-net/data_gt/multi30k/

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{xu2021archnet,
      title={Arch-Net: Model Distillation for Architecture Agnostic Model Deployment}, 
      author={Weixin Xu and Zipeng Feng and Shuangkang Fang and Song Yuan and Yi Yang and Shuchang Zhou},
      year={2021},
      eprint={2111.01135},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

attention-is-all-you-need-pytorch

LSQuantization

pytorch-mobilenet-v1

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Related tags

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

Main Results

Dependencies

Data Preparation

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

Machine Translation

Citation

Acknowledgements

Contact

Owner

MEGVII Research

project page for VinVL

Mixed Neural Likelihood Estimation for models of decision-making

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

A repository for benchmarking neural vocoders by their quality and speed.

Second Order Optimization and Curvature Estimation with K-FAC in JAX.

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

A PyTorch implementation of Radio Transformer Networks from the paper "An Introduction to Deep Learning for the Physical Layer".

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

Adaptive FNO transformer - official Pytorch implementation

Iowa Project - My second project done at General Assembly, focused on feature engineering and understanding Linear Regression as a concept

DeepLab2: A TensorFlow Library for Deep Labeling

"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

September-Assistant - Open-source Windows Voice Assistant

Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

LSTM model trained on a small dataset of 3000 names written in PyTorch

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)