The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Last update: Dec 07, 2022

Related tags

Deep Learning YOCO-BERT

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

@misc{zhang2021compress,
      title={You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient}, 
      author={Shaokun Zhang and Xiawu Zheng and Chenyi Yang and Yuchao Li and Yan Wang and Fei Chao and Mengdi Wang and Shen Li and Jun Yang and Rongrong Ji},
      year={2021},
      eprint={2106.02435},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      }

Overview

This repository is the official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

📋 We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere. Compared with state of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving superior average accuracy improvement on the GLUE.

Requirements

Python > 3.6
Pytorch = 1.7.0
transformers = 3.5.0

Training

To train the super-BERTs in the paper, run this command:

python train_superbert.py --cfg /path_to_superbert_training_config/config.yaml

Searching

To search the optimal sub-BERTs given any constraints in the paper, run this command:

python search_subbert.py --cfg /path_to_subbert_searching_config/config.yaml

Evaluation

The evaluation results will be reported after the searching process.

Config

We release all the traning and searching configs in config

Results

Our model achieves the following performance on :

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Datasets	SST-2	MRPC	CoLA	RTE	MNLI	QQP	QNLI
Results	92.8	90.3	59.8	72.9	82.6	90.5	87.2

📋 The detailed metrics used in this code are reported in the paper.

Licence

This repository is released under the MIT license. See LICENSE for more information.

Contact

Any problem regarding this code re-implementation, feel free to contact the first author: [email protected]

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Related tags

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

Overview

Requirements

Training

Searching

Evaluation

Config

Results

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Licence

Contact

Owner

AI drive app that can help user become beautiful.

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

ObsPy: A Python Toolbox for seismology/seismological observatories.

The BCNet related data and inference model.

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

PyTorch implementation of Neural Dual Contouring.

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

3D position tracking for soccer players with multi-camera videos

This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

An implementation of a sequence to sequence neural network using an encoder-decoder

S2s2net - Sentinel-2 Super-Resolution Segmentation Network

Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"

Weakly supervised medical named entity classification

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

PyTorch Personal Trainer: My framework for deep learning experiments

Generating Fractals on Starknet with Cairo

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation