PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Last update: Dec 28, 2022

Related tags

Deep Learning hyperformer

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):
```
bash scripts/hyperformer++.sh
```
To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):
```
bash scripts/hyperformer.sh
```
To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):
```
bash scripts/adapters_dagger.sh   
```
To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):
```
bash scripts/adapters.sh 
```
To run T5 finetuning model in a multi-task learning setup:
```
bash scripts/finetune.sh
```
To run T5 finetuning model in a single-task learning setup:
```
bash scripts/finetune_single_task.sh
```

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Related tags

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Installation

How to run the models

Bibliography

Final words

Owner

Rabeeh Karimi Mahabadi

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

MogFace: Towards a Deeper Appreciation on Face Detection

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CUAD

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Starter Code for VALUE benchmark

Streamlit App For Product Analysis - Streamlit App For Product Analysis

The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

OntoProtein: Protein Pretraining With Ontology Embedding

(NeurIPS 2021) Realistic Evaluation of Transductive Few-Shot Learning

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing

Аналитика доходности инвестиционного портфеля в Тинькофф брокере

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

A GPT, made only of MLPs, in Jax

Tree Nested PyTorch Tensor Lib

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

Sequence lineage information extracted from RKI sequence data repo