Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Paper has been accepted by Neurocomputing 415(2020): 106–113.

Authors: Yuang Liu, Wei Zhang and Jun Wang.

Links: [ pdf ] [ code ]

Requirements

PyTorch >= 1.0.0
Jupyter
visdom

Introduction

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

Citation

@article{LIU2020106,
    title = {Adaptive multi-teacher multi-level knowledge distillation},
    author = {Yuang Liu and Wei Zhang and Jun Wang},
    journal = {Neurocomputing},
    volume = {415},
    pages = {106 -- 113},
    year = {2020},
    issn = {0925 -- 2312},
}

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Related tags

Overview

Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Requirements

Introduction

Citation

Owner

Frank Liu

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Classification of EEG data using Deep Learning

Effective Use of Transformer Networks for Entity Tracking

How to Predict Stock Prices Easily Demo

Keep CALM and Improve Visual Feature Attribution

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Source code for "Interactive All-Hex Meshing via Cuboid Decomposition [SIGGRAPH Asia 2021]".

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

code and models for "Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation"

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

Face Recognition Attendance Project

Code for NeurIPS2021 submission "A Surrogate Objective Framework for Prediction+Programming with Soft Constraints"

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Acute ischemic stroke dataset

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

A PyTorch-based library for fast prototyping and sharing of deep neural network models.