DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Last update: Dec 27, 2021

Related tags

Deep Learning machine-learning

Overview

DSEE

Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallahp

Overview

TBD

Requirements

We use conda to create virtual environments.

conda create -f environment.yml
conda activate dsee

Command

Unstructured DSEE

Step 0.

cd non-GPT-2
pip install -e .
cd ..

Step 1. Pre-training

Take SST-2 as example:

OUTPUT_DIR='./sst2_rank16_s1_64'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12345 non-GPT-2/examples/pytorch/text-classification/run_glue.py \
    --save_total_limit 10 \
    --model_name_or_path bert-base-uncased \ 
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64  \
    --learning_rate 2e-4 \
    --evaluation_strategy steps

Step 2. Pruning & Fine-tuning

OUTPUT_DIR='./sst2_rank16_s1_64_prune_0.5'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12335 \
    non-GPT-2/examples/pytorch/text-classification/run_glue_prune_tune.py \
    --save_total_limit 10 \
    --model_name_or_path sst2_rank16_s1_64 \
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64 \
    --learning_rate 2e-4 \
    --pruning_ratio 0.5 \
    --evaluation_strategy steps

TODO

Codes for Unstructured DSEE on GPT-2
Codes for Structured DSEE

Acknowledgement

The Huggingface's Transformers (https://github.com/huggingface/transformers)

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Related tags

Overview

DSEE

Overview

Requirements

Command

Unstructured DSEE

Step 0.

Step 1. Pre-training

Step 2. Pruning & Fine-tuning

TODO

Acknowledgement

Owner

VITA

TorchXRayVision: A library of chest X-ray datasets and models.

Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Implementation of Neonatal Seizure Detection using EEG signals for deploying on edge devices including Raspberry Pi.

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Repository containing the PhD Thesis "Formal Verification of Deep Reinforcement Learning Agents"

Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

Python script that takes an Impulse response .wav and a input .wav to demonstrate audio convolution.

Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

A curated (most recent) list of resources for Learning with Noisy Labels

Implicit Deep Adaptive Design (iDAD)

WatermarkRemoval-WDNet-WACV2021

RRL: Resnet as representation for Reinforcement Learning

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Differentiable Wavetable Synthesis

Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics