This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Last update: Mar 24, 2022

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the GLUE dataset (containing MNLI) using the script in HERE, and put the files into ./dataset/glue/. Download the Amazon Reviews dataset from HERE, and extract it into ./dataset/amazon_review/

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

bash train_domain.sh

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

bash finetune_domain.sh

Train the teacher model (HRKD-teacher) from multi-domain

bash train_multi_domain.sh

And then put the checkpoints to the specified directories (see the beginning of finetune_multi_domain.py for more details).

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

bash finetune_multi_domain.sh

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021hrkd,
  title     = {{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression},
  author    = {Chenhe Dong and Yaliang Li and Ying Shen and Minghui Qiu},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2021}
}

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Related tags

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Requirements

Download checkpoints

Prepare dataset

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

Train the teacher model (HRKD-teacher) from multi-domain

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

Reference

Owner

Chenhe Dong

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

TensorLight - A high-level framework for TensorFlow

This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger Bands to create a projected active liquidity range.

Benchmark VAE - Library for Variational Autoencoder benchmarking

Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Global Filter Networks for Image Classification

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

N-Omniglot is a large neuromorphic few-shot learning dataset

Individual Tree Crown classification on WorldView-2 Images using Autoencoder -- Group 9 Weak learners - Final Project (Machine Learning 2020 Course)

[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

Towards the D-Optimal Online Experiment Design for Recommender Selection (KDD 2021)

Denoising Diffusion Implicit Models

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

Programming with Neural Surrogates of Programs

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

AFLNet: A Greybox Fuzzer for Network Protocols