[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Last update: Dec 09, 2022

Overview

CAT

Pytorch implementation of our method for compressing image-to-image models.
Teachers Do More Than Teach: Compressing Image-to-Image Models
Qing Jin¹, Jian Ren², Oliver J. Woodford, Jiazhuo Wang², Geng Yuan¹, Yanzhi Wang¹, Sergey Tulyakov²
¹Northeastern University, ²Snap Inc.
In CVPR 2021.

Overview

Compression And Teaching (CAT) framework for compressing image-to-image models: ① Given a pre-trained teacher generator Gt, we determine the architecture of a compressed student generator Gs by eliminating those channels with smallest magnitudes of batch norm scaling factors. ② We then distill knowledge from the pretrained teacher Gt on the student Gs via a novel distillation technique, which maximize the similarity between features of both generators, defined in terms of kernel alignment (KA).

Prerequisites

Linux
Python 3
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

Clone this repo:

git clone [email protected]:snap-research/CAT.git
cd CAT

Install PyTorch 1.7 and other dependencies (e.g., torchvision).
- For pip users, please type the command pip install -r requirements.txt.
- For Conda users, please create a new Conda environment using conda env create -f environment.yml.

Data Preparation

CycleGAN

Setup

Download the CycleGAN dataset (e.g., horse2zebra).

bash datasets/download_cyclegan_dataset.sh horse2zebra

Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistic information for several datasets on Google Drive Folder.

Pix2pix

Setup

Download the pix2pix dataset (e.g., cityscapes).

bash datasets/download_pix2pix_dataset.sh cityscapes

Cityscapes Dataset

For the Cityscapes dataset, we cannot provide it due to license issue. Please download the dataset from https://cityscapes-dataset.com and use the script prepare_cityscapes_dataset.py to preprocess it. You need to download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip and unzip them in the same folder. For example, you may put gtFine and leftImg8bit in database/cityscapes-origin. You need to prepare the dataset with the following commands:

python datasets/get_trainIds.py database/cityscapes-origin/gtFine/
python datasets/prepare_cityscapes_dataset.py \
--gtFine_dir database/cityscapes-origin/gtFine \
--leftImg8bit_dir database/cityscapes-origin/leftImg8bit \
--output_dir database/cityscapes \
--table_path datasets/table.txt

You will get a preprocessed dataset in database/cityscapes and a mapping table (used to compute mIoU) in dataset/table.txt.

Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistics for several datasets. For example,
```
bash datasets/download_real_stat.sh cityscapes A
```

Evaluation Preparation

mIoU Computation

To support mIoU computation, you need to download a pre-trained DRN model drn-d-105_ms_cityscapes.pth from http://go.yf.io/drn-cityscapes-models. By default, we put the drn model in the root directory of our repo. Then you can test our compressed models on cityscapes after you have downloaded our compressed models.

FID/KID Computation

To compute the FID/KID score, you need to get some statistical information from the groud-truth images of your dataset. We provide a script get_real_stat.py to extract statistical information. For example, for the map2arial dataset, you could run the following command:

python get_real_stat.py \
--dataroot database/map2arial \
--output_path real_stat/maps_B.npz \
--direction AtoB

For paired image-to-image translation (pix2pix and GauGAN), we calculate the FID between generated test images to real test images. For unpaired image-to-image translation (CycleGAN), we calculate the FID between generated test images to real training+test images. This allows us to use more images for a stable FID evaluation, as done in previous unconditional GANs research. The difference of the two protocols is small. The FID of our compressed CycleGAN model increases by 4 when using real test images instead of real training+test images.

KID is not supported for the cityscapes dataset.

Model Training

Teacher Training

The first step of our framework is to train a teacher model. For this purpose, please run the script train_inception_teacher.sh under the correponding folder named as the dataset, for example, run

bash scripts/cycle_gan/horse2zebra/train_inception_teacher.sh

Student Training

With the pretrained teacher model, we can determine the architecture of student model under prescribed computational budget. For this purpose, please run the script train_inception_student_XXX.sh under the correponding folder named as the dataset, where XXX stands for the computational budget (in terms of FLOPs for this case) and can be different for different datasets and models. For example, for CycleGAN with Horse2Zebra dataset, our computational budget is 2.6B FLOPs, so we run

bash scripts/cycle_gan/horse2zebra/train_inception_student_2p6B.sh

Pre-trained Models

For convenience, we also provide pretrained teacher and student models on Google Drive Folder.

Model Evaluation

With pretrained teacher and student models, we can evaluate them on the dataset. For this purpose, please run the script evaluate_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/evaluate_inception_student_2p6B.sh

Model Export

The final step is to export the trained compressed model as onnx file to run on mobile devices. For this purpose, please run the script onnx_export_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/onnx_export_inception_student_2p6B.sh

This will create one .onnx file in addition to log files.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jin2021teachers,
  title={Teachers Do More Than Teach: Compressing Image-to-Image Models},
  author={Jin, Qing and Ren, Jian and Woodford, Oliver J and Wang, Jiazhuo and Yuan, Geng and Wang, Yanzhi and Tulyakov, Sergey},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledgements

Our code is developed based on AtomNAS and gan-compression.

We also thank pytorch-fid for FID computation and drn for mIoU computation.

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Related tags

Overview

CAT

Overview

Prerequisites

Getting Started

Installation

Data Preparation

CycleGAN

Setup

Pix2pix

Setup

Cityscapes Dataset

Evaluation Preparation

mIoU Computation

FID/KID Computation

Model Training

Teacher Training

Student Training

Pre-trained Models

Model Evaluation

Model Export

Citation

Acknowledgements

Owner

Snap Research

Method for facial emotion recognition compitition of Xunfei and Datawhale .

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Detecting drunk people through thermal images using Deep Learning (CNN)

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

AOT (Associating Objects with Transformers) in PyTorch

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Yolov5-opencv-cpp-python - Example of using ultralytics YOLO V5 with OpenCV 4.5.4, C++ and Python

OneShot Learning-based hotword detection.

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

OpenCVのGrabCut()を利用したセマンティックセグメンテーション向けアノテーションツール(Annotation tool using GrabCut() of OpenCV. It can be used to create datasets for semantic segmentation.)

Art Project "Schrödinger's Game of Life"

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Laser device for neutralizing - mosquitoes, weeds and pests

A simple log parser and summariser for IIS web server logs

Library of various Few-Shot Learning frameworks for text classification

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Interactive Image Segmentation via Backpropagating Refinement Scheme

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection