Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".



Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

Good News

2021/10/29 - Code: Code of FastPLM is released on both Pypi and Github.

2021/09/08 - Paper: Journal version of FastBERT (FastPLM) is accepted by IEEE TNNLS. "An Empirical Study on Adaptive Inference for Pretrained Language Model".

2020/07/05 - Update: Pypi version of FastBERT has been launched. Please see fastbert-pypi.

Install fastbert with pip

$ pip install fastbert


python >= 3.4.0, Install all the requirements with pip.

$ pip install -r requirements.txt

Quick start on the Chinese Book review dataset

Download the pre-trained Chinese BERT parameters from here, and save it to the models directory with the name of "Chinese_base_model.bin".

Run the following command to validate our FastBERT with Speed=0.5 on the Book review datasets.

$ CUDA_VISIBLE_DEVICES="0" python3 -u \
        --pretrained_model_path ./models/Chinese_base_model.bin \
        --vocab_path ./models/google_zh_vocab.txt \
        --train_path ./datasets/douban_book_review/train.tsv \
        --dev_path ./datasets/douban_book_review/dev.tsv \
        --test_path ./datasets/douban_book_review/test.tsv \
        --epochs_num 3 --batch_size 32 --distill_epochs_num 5 \
        --encoder bert --fast_mode --speed 0.5 \
        --output_model_path  ./models/douban_fastbert.bin

Meaning of each option.

usage: --pretrained_model_path Path to initialize model parameters.
       --vocab_path Path to the vocabulary.
       --train_path Path to the training dataset.
       --dev_path Path to the validating dataset.
       --test_path Path to the testing dataset.
       --epochs_num The epoch numbers of fine-tuning.
       --batch_size Batch size.
       --distill_epochs_num The epoch numbers of the self-distillation.
       --encoder The type of encoder.
       --fast_mode Whether to enable the fast mode of FastBERT.
       --speed The Speed value in the paper.
       --output_model_path Path to the output model parameters.

Test results on the Book review dataset.

Test results at fine-tuning epoch 3 (Baseline): Acc.=0.8688;  FLOPs=21785247744;
Test results at self-distillation epoch 1     : Acc.=0.8698;  FLOPs=6300902177;
Test results at self-distillation epoch 2     : Acc.=0.8691;  FLOPs=5844839008;
Test results at self-distillation epoch 3     : Acc.=0.8664;  FLOPs=5170940850;
Test results at self-distillation epoch 4     : Acc.=0.8664;  FLOPs=5170940327;
Test results at self-distillation epoch 5     : Acc.=0.8664;  FLOPs=5170940327;

Quick start on the English dataset

Download the pre-trained English BERT parameters from here, and save it to the models directory with the name of "English_uncased_base_model.bin".

Download the from here, and then unzip it to the datasets directory.

Run the following command to validate our FastBERT with Speed=0.5 on the datasets.

$ CUDA_VISIBLE_DEVICES="0" python3 -u \
        --pretrained_model_path ./models/English_uncased_base_model.bin \
        --vocab_path ./models/google_uncased_en_vocab.txt \
        --train_path ./datasets/ag_news/train.tsv \
        --dev_path ./datasets/ag_news/test.tsv \
        --test_path ./datasets/ag_news/test.tsv \
        --epochs_num 3 --batch_size 32 --distill_epochs_num 5 \
        --encoder bert --fast_mode --speed 0.5 \
        --output_model_path  ./models/ag_news_fastbert.bin

Test results on the dataset.

Test results at fine-tuning epoch 3 (Baseline): Acc.=0.9447;  FLOPs=21785247744;
Test results at self-distillation epoch 1     : Acc.=0.9308;  FLOPs=2172009009;
Test results at self-distillation epoch 2     : Acc.=0.9311;  FLOPs=2163471246;
Test results at self-distillation epoch 3     : Acc.=0.9314;  FLOPs=2108341649;
Test results at self-distillation epoch 4     : Acc.=0.9314;  FLOPs=2108341649;
Test results at self-distillation epoch 5     : Acc.=0.9314;  FLOPs=2108341649;


More datasets can be downloaded from here.

Other implementations

There are some other excellent implementations of FastBERT.


This work is funded by 2019 Tencent Rhino-Bird Elite Training Program. Work done while this author was an intern at Tencent.

If you use this code, please cite this paper:

  title={{FastBERT}: a Self-distilling BERT with Adaptive Inference Time},
  author={Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju},
  booktitle={Proceedings of ACL 2020},
Volumetric Correspondence Networks for Optical Flow, NeurIPS 2019.

VCN: Volumetric correspondence networks for optical flow [project website] Requirements python 3.6 pytorch 1.1.0-1.3.0 pytorch correlation module (opt

Gengshan Yang 144 Dec 06, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022
PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Improving Generation and Evaluation of Visual Stories via Semantic Consistency PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluat

Adyasha Maharana 28 Dec 08, 2022
Attempt at implementation of a simple GAN using Keras

Simple GAN This is my attempt to make a wrapper class for a GAN in keras which can be used to abstract the whole architecture process. Simple GAN Over

Deven96 7 May 23, 2019
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Jan 02, 2023
Garbage classification using structure data.

垃圾分类模型使用说明 1.包含以下数据文件 文件 描述 data/MaterialMapping.csv 物体以及其归类的信息 data/TestRecords 光谱原始测试数据 CSV 文件 data/ CSV 文件描述文件 data/Boundaries.cs

wenqi 1 Dec 10, 2021
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL: Graph Contrastive Learning for PyTorch PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL com

GCL: Graph Contrastive Learning Library for PyTorch 594 Jan 08, 2023
Causal estimators for use with WhyNot

WhyNot Estimators A collection of causal inference estimators implemented in Python and R to pair with the Python causal inference library whynot. For

ZYKLS 8 Apr 06, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC. Para los Laboratorios de la materia, vamos a utilizar el len

Luis Biedma 18 Dec 12, 2022
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Esteban Vilca 3 Dec 01, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

n-stage Latent Dirichlet Allocation (n-LDA) Proposed n-LDA & A Novel Approach for classical LDA Latent Dirichlet Allocation (LDA) is a generative prob

Anıl Güven 4 Mar 07, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che 542 Dec 03, 2022
Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

Brando Koch 195 Dec 30, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

CM-NAS Official Pytorch code of paper CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification in ICCV2021. Vis

JDAI-CV 40 Nov 25, 2022