This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Last update: Dec 24, 2022

Related tags

Overview

MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

Create and activate conda environment.

conda env create -f environment.yml

Install Transformers locally.

pip install -e .

Note: The code is adapted from this codebase. Arguments regarding LoRA and adapter can be safely ignored.

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

Use bert_base_mnli_example.sh to compute the importance scores, add a --preprocess_importance argument, remove the --do_train argument.
If multiple GPUs are used to compute the importance scores, a importance_[rank].pkl file will be saved for each GPU. Use merge_importance.py to merge these files.
To use the pre-computed importance scores, pass the file name to --moebert_load_importance.

Knowledge Distillation

For GLUE tasks, see examples/text-classification/run_glue.py.
For question answering tasks, see examples/question-answering/run_qa.py.
Run bash bert_base_mnli_example.sh as an example.
The codebase supports different routing strategies: gate-token, gate-sentence, hash-random and hash-balance. Choices should be passed to --moebert_route_method.
- To use hash-balance, a balanced hash list needs to be pre-computed using hash_balance.py. Path to the saved hash list should be passed to --moebert_route_hash_list.
- Add a load balancing loss by setting --moebert_load_balance when using trainable gating mechanisms.
- The sentence-based gating mechanism (gate-sentence) is advantageous for inference because it induces significantly less communication overhead compared with token-level routing methods.

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Related tags

Overview

MoEBERT

Installation

Instructions

Importance Score Computation

Knowledge Distillation

Owner

Simiao Zuo

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

An AFL implementation with UnTracer (our coverage-guided tracer)

Single Image Super-Resolution (SISR) with SRResNet, EDSR and SRGAN

Weight estimation in CT by multi atlas techniques

Benchmark VAE - Library for Variational Autoencoder benchmarking

Script for getting information in discord

CVNets: A library for training computer vision networks

Crawl & visualize ICLR papers and reviews

a pytorch implementation of auto-punctuation learned character by character

Learning Off-Policy with Online Planning, CoRL 2021

🕵 Artificial Intelligence for social control of public administration

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

KinectFusion implemented in Python with PyTorch

Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Memory efficient transducer loss computation

A multi-scale unsupervised learning for deformable image registration

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)