DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Last update: Dec 27, 2021

Related tags

Deep Learning machine-learning

Overview

DSEE

Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallahp

Overview

TBD

Requirements

We use conda to create virtual environments.

conda create -f environment.yml
conda activate dsee

Command

Unstructured DSEE

Step 0.

cd non-GPT-2
pip install -e .
cd ..

Step 1. Pre-training

Take SST-2 as example:

OUTPUT_DIR='./sst2_rank16_s1_64'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12345 non-GPT-2/examples/pytorch/text-classification/run_glue.py \
    --save_total_limit 10 \
    --model_name_or_path bert-base-uncased \ 
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64  \
    --learning_rate 2e-4 \
    --evaluation_strategy steps

Step 2. Pruning & Fine-tuning

OUTPUT_DIR='./sst2_rank16_s1_64_prune_0.5'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12335 \
    non-GPT-2/examples/pytorch/text-classification/run_glue_prune_tune.py \
    --save_total_limit 10 \
    --model_name_or_path sst2_rank16_s1_64 \
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64 \
    --learning_rate 2e-4 \
    --pruning_ratio 0.5 \
    --evaluation_strategy steps

TODO

Codes for Unstructured DSEE on GPT-2
Codes for Structured DSEE

Acknowledgement

The Huggingface's Transformers (https://github.com/huggingface/transformers)

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Related tags

Overview

DSEE

Overview

Requirements

Command

Unstructured DSEE

Step 0.

Step 1. Pre-training

Step 2. Pruning & Fine-tuning

TODO

Acknowledgement

Owner

VITA

Provide partial dates and retain the date precision through processing

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

JAX + dataclasses

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Speech Recognition using DeepSpeech2.

BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Transfer Learning Shootout for PyTorch's model zoo (torchvision)

U-Net for GBM

This repository implements Douzero's interface to IGCA.

Continual Learning of Long Topic Sequences in Neural Information Retrieval

An Evaluation of Generative Adversarial Networks for Collaborative Filtering.

Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

The story of Chicken for Club Bing

Automated detection of anomalous exoplanet transits in light curve data.

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Attention-guided gan for synthesizing IR images