PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

Last update: Dec 24, 2022

Overview

FKD: A Fast Knowledge Distillation Framework for Visual Recognition

Official PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition. Zhiqiang Shen and Eric Xing from CMU and MUZUAI.

Abstract

Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as the supervised classification and self-supervised representation learning, while the main drawback of a vanilla KD framework lies in its mechanism that most of the computational overhead is consumed on forwarding through the giant teacher networks, which makes the whole learning procedure in a low-efficient and costly manner. In this work, we propose a Fast Knowledge Distillation (FKD) framework that simulates the distillation training phase and generates soft labels following the multi-crop KD procedure, meanwhile enjoying the faster training speed than ReLabel as we have no post-processes like RoI align and softmax operations. Our FKD is even more efficient than the conventional classification framework when employing multi-crop in the same image for data loading. We achieve 79.8% using ResNet-50 on ImageNet-1K, outperforming ReLabel by ~1.0% while being faster. We also demonstrate the efficiency advantage of FKD on the self-supervised learning task.

Supervised Training

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code. This repo has minimal modifications on that code.
Download our soft label. We provide multiple types of soft labels, and we recommend to use Marginal Smoothing Top-5 (500-crop).

FKD Training on CNNs

To train a model, run train_FKD.py with the desired model architecture and the path to the soft label and ImageNet dataset:

python train_FKD.py -a resnet50 --lr 0.1 --num_crops 4 -b 1024 --cos --softlabel_path [soft label path] [imagenet-folder with train and val folders]

For --softlabel_path, simply use format as ./FKD_soft_label_500_crops_marginal_smoothing_k_5

Multi-processing distributed training is supported, please refer to official PyTorch ImageNet training code for details.

Evaluation

python train_FKD.py -a resnet50 -e --resume [model path] [imagenet-folder with train and val folders]

Trained Models

Model	accuracy (Top-1)	weights	configurations
`ReLabel ResNet-50`	78.9	--	--
`FKD ResNet-50`	79.8	link	Table 10 in paper

`ReLabel ResNet-101`	80.7	--	--
`FKD ResNet-101`	81.7	link	Table 10 in paper

FKD Training on ViT/DeiT and SReT

To train a ViT model, run train_ViT_FKD.py with the desired model architecture and the path to the soft label and ImageNet dataset:

cd train_ViT
python train_ViT_FKD.py -a SReT_LT --lr 0.002 --wd 0.05 --num_crops 4 -b 1024 --cos --softlabel_path [soft label path] [imagenet-folder with train and val folders]

For the instructions of SReT_LT model, please refer to SReT for details.

Evaluation

python train_ViT_FKD.py -a SReT_LT -e --resume [model path] [imagenet-folder with train and val folders]

Trained Models

Model	FLOPs	#params	accuracy (Top-1)	weights	configurations
`DeiT-T-distill`	1.3B	5.7M	74.5	--	--
`FKD ViT/DeiT-T`	1.3B	5.7M	75.2	link	Table 11 in paper
`SReT-LT-distill`	1.2B	5.0M	77.7	--	--
`FKD SReT-LT`	1.2B	5.0M	78.7	link	Table 11 in paper

Fast MEAL V2

Please see MEAL V2 for the instructions to run FKD with MEAL V2.

Self-supervised Representation Learning Using FKD

Please see FKD-SSL for the instructions to run FKD code for SSL task.

Citation

@article{shen2021afast,
      title={A Fast Knowledge Distillation Framework for Visual Recognition}, 
      author={Zhiqiang Shen and Eric Xing},
      year={2021},
      journal={arXiv preprint arXiv:2112.01528}
}

Contact

Zhiqiang Shen (zhiqians at andrew.cmu.edu or zhiqiangshen0214 at gmail.com)

PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

Related tags

Overview

FKD: A Fast Knowledge Distillation Framework for Visual Recognition

Abstract

Supervised Training

Preparation

FKD Training on CNNs

Evaluation

Trained Models

FKD Training on ViT/DeiT and SReT

Evaluation

Trained Models

Fast MEAL V2

Self-supervised Representation Learning Using FKD

Citation

Contact

Owner

Zhiqiang Shen

DGCNN - Dynamic Graph CNN for Learning on Point Clouds

UV matrix decompostion using movielens dataset

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

Implementation of RegretNet with Pytorch

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment

Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

Contrastive Learning of Structured World Models

Pytorch and Torch testing code of CartoonGAN

The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch