[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Last update: Nov 10, 2022

Related tags

Overview

MosaicKD

Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data"

1. Motivation

Natural images share common local patterns. In MosaicKD, these local patterns are first dissembled from OOD data and then assembled to synthesize in-domain data, making OOD-KD feasible.

2. Method

MosaicKD establishes a four-player minimax game between a generator G, a patch discriminator D, a teacher model T and a student model S. The generator, as those in prior GANs, takes as input a random noise vector and learns to mosaic synthetic in-domain samples with locally-authentic and globally-legitimate distributions, under the supervisions back-propagated from the other three players.

3. Reproducing our results

3.1 Prepare teachers

Please download our pre-trained models from Dropbox (266 M) and extract them as "checkpoints/pretrained/*.pth". You can also train your own models as follows:

python train_scratch.py --lr 0.1 --batch-size 256 --model wrn40_2 --dataset cifar100

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

Vanilla KD (Blind KD)

python kd_vanilla.py --lr 0.1 --batch-size 128 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --gpu 0

Data-Free KD (DFQAD)

python kd_datafree.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

MosaicKD (This work)

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

Prepare 32x32 datasets
Please prepare the 32x32 ImageNet following the instructions from https://patrykchrabaszcz.github.io/Imagenet32/ and extract them as "data/ImageNet_32x32/train" and "data/ImageNet_32x32/val". You can prepare Places365 in the same way.

MosaicKD on OOD subset
As ImageNet & Places365 contain a large number of in-domain samples, we construct OOD subset for training. Please run the scripts with ''--ood_subset'' to enable subset selection.

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --ood_subset --gpu 0

4. Visualization of synthetic data

5. Citation

If you found this work useful for your research, please cite our paper:

@article{fang2021mosaicking,
  title={Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data},
  author={Gongfan Fang and Yifan Bao and Jie Song and Xinchao Wang and Donglin Xie and Chengchao Shen and Mingli Song},
  journal={arXiv preprint arXiv:2110.15094},
  year={2021}
}

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Related tags

Overview

MosaicKD

1. Motivation

2. Method

3. Reproducing our results

3.1 Prepare teachers

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

4. Visualization of synthetic data

5. Citation

Owner

ZJU-VIPA

Variational autoencoder for anime face reconstruction

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Volumetric parameterization of the placenta to a flattened template

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Official implementation of Pixel-Level Bijective Matching for Video Object Segmentation

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Voxel Transformer for 3D object detection

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Chess reinforcement learning by AlphaGo Zero methods.

Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Self-Supervised Methods for Noise-Removal

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Pytorch and Keras Implementations of Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects.

9th place solution

A tight inclusion function for continuous collision detection