Explore extreme compression for pre-trained language models

Overview

Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

Before Training

install some libraries

 pip install tensorly==0.5.0

Torch is needed, torch 1.0-1.4 is preferred

Install horovod for distributed learning

Configuration Install horovod on GPU

pip install horovod[pytorch]

loading pre-trained models

wget https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin -P  models/bert-base-uncased
wget https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt -P  models/bert-base-uncased
cp models/bert-base-uncased/pytorch_model.bin models/bert-td-72-384/pytorch_model.bin 
cp models/bert-base-uncased/vocab.txt models/bert-td-72-384/vocab.txt

generate training data for given corpora (e.g., saved in the path "corpora" )

python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \ 
                  --bert_model ${BERT_BASE_DIR}$ \
                  --reduce_memory --do_lower_case \
                  --epochs_to_generate 3 \
                  --output_dir ${CORPUS_JSON_DIR}$ 

task data augmentation

python data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR}$ \
                            --glove_embs ${GLOVE_EMB}$ \
                            --glue_dir ${GLUE_DIR}$ \  
                            --task_name ${TASK_NAME}$

Decomposing BERT

decomposition and general distillation

Run with horovod

mpirun -np 8 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python3 general_distill.py --teacher_model models/bert-base-uncased --student_model models/bert-gd-72-384 --pregenerated_data data/pregenerated_data --num_train_epochs 2.0 --train_batch_size 32 --output_dir output/bert-gd-72-384 -use_swap --do_lower_case

To restrict sharing among SAN or FFN, add "ops" and set "ops" to be "san" or "ffn" in bert-gd-72-384/config.json

ops = "san"

Evaluation

Task distillation with data augmentation in fine-tuning phase

Rename a pretrained model as "", for instance, change step_0_pytorch_model.bin to pytorch_model.bin, and change load_compressed_model from false to true in output/config.json

Task distillation for distributed training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 task_distill.py --teacher_model models/bert-base-uncasedi/STS-B --student_model models/bert-gd-72-384 --task_name STS-B --aug_train --data_dir data/glue_data/SST-2 --max_seq_length 128 --train_batch_size 32 --aug_train --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir ./output/36-256-STS-B

Task distillation for single gpu

python3  task_distill.py  --teacher_model models/bert-base-uncased   --student_model  models/bert-td-72-384  --output output_demo  --data_dir  data/glue_data/SST-2   --task_name  SST-2  --do_lower_case --aug_train   

For augmentation, you should add --aug_train

Get test result for model

python run_glue.py --model_name_or_path  models/bert-td-72-384/SST-2 --task_name SST-2 --do_eval --do_predict --data_dir data/glue_data/STS-B --max_seq_length 128 --save_steps 500 --save_total_limit 2 --output_dir ./output/SST-2
Owner
twinkle
Stay hungry, stay foolish.
twinkle
Machine Learning Model deployment for Container (TensorFlow Serving)

try_tf_serving ├───dataset │ ├───testing │ │ ├───paper │ │ ├───rock │ │ └───scissors │ └───training │ ├───paper │ ├───rock

Azhar Rizki Zulma 5 Jan 07, 2022
This is the repository for Learning to Generate Piano Music With Sustain Pedals

SusPedal-Gen This is the official repository of Learning to Generate Piano Music With Sustain Pedals Demo Page Dataset The dataset used in this projec

Joann Ching 12 Sep 02, 2022
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Pre

Facebook Research 437 Dec 23, 2022
Source code for "OmniPhotos: Casual 360° VR Photography"

OmniPhotos: Casual 360° VR Photography Project Page | Video | Paper | Demo | Data This repository contains the source code for creating and viewing Om

Christian Richardt 144 Dec 30, 2022
Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

109 Dec 16, 2022
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

Daochen Zha 48 Nov 21, 2022
Meandering In Networks of Entities to Reach Verisimilar Answers

MINERVA Meandering In Networks of Entities to Reach Verisimilar Answers Code and models for the paper Go for a Walk and Arrive at the Answer - Reasoni

Shehzaad Dhuliawala 271 Dec 13, 2022
AI-generated-characters for Learning and Wellbeing

AI-generated-characters for Learning and Wellbeing Click here for the full project page. This repository contains the source code for the paper AI-gen

MIT Media Lab 214 Jan 01, 2023
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmenta

NVIDIA Research Projects 3.2k Dec 30, 2022
Real-time Neural Representation Fusion for Robust Volumetric Mapping

NeuralBlox: Real-Time Neural Representation Fusion for Robust Volumetric Mapping Paper | Supplementary This repository contains the implementation of

ETHZ ASL 106 Dec 24, 2022
EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network

EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network This repo contains the official Pytorch implementaion code and conf

Hu Zhang 175 Jan 07, 2023
CLIPImageClassifier wraps clip image model from transformers

CLIPImageClassifier CLIPImageClassifier wraps clip image model from transformers. CLIPImageClassifier is initialized with the argument classes, these

Jina AI 6 Sep 12, 2022
DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021)

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021) This repo is the implementation of DPC. Tested environment Pyth

Dvir Ginzburg 30 Nov 30, 2022
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Nicholas Lee 3 Jan 09, 2022
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for

Zekeriyya Demirci 1 Jan 09, 2022
Pytorch Implementation for Dilated Continuous Random Field

DilatedCRF Pytorch implementation for fully-learnable DilatedCRF. If you find my work helpful, please consider our paper: @article{Mo2022dilatedcrf,

DunnoCoding_Plus 3 Nov 13, 2022
DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

DeepLab Introduction DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe. It combines densely-compute

Ali 234 Nov 14, 2022
code for "Self-supervised edge features for improved Graph Neural Network training",

Self-supervised edge features for improved Graph Neural Network training Data availability: Here is a link to the raw data for the organoids dataset.

Neal Ravindra 23 Dec 02, 2022