PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Last update: Dec 28, 2022

Related tags

Deep Learning hyperformer

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):
```
bash scripts/hyperformer++.sh
```
To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):
```
bash scripts/hyperformer.sh
```
To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):
```
bash scripts/adapters_dagger.sh   
```
To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):
```
bash scripts/adapters.sh 
```
To run T5 finetuning model in a multi-task learning setup:
```
bash scripts/finetune.sh
```
To run T5 finetuning model in a single-task learning setup:
```
bash scripts/finetune_single_task.sh
```

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Related tags

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Installation

How to run the models

Bibliography

Final words

Owner

Rabeeh Karimi Mahabadi

This is the official pytorch implementation of the BoxEL for the description logic EL++

Greedy Gaussian Segmentation

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

TAUFE: Task-Agnostic Undesirable Feature DeactivationUsing Out-of-Distribution Data

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

This application explain how we can easily integrate Deepface framework with Python Django application

Auditing Black-Box Prediction Models for Data Minimization Compliance

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Improving the robustness and performance of biomedical NLP models through adversarial training

Export CenterPoint PonintPillars ONNX Model For TensorRT

Distributed Evolutionary Algorithms in Python

implementation for paper "ShelfNet for fast semantic segmentation"

[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

DL course co-developed by YSDA, HSE and Skoltech

Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

Keras Image Embeddings using Contrastive Loss

MG-GCN: Scalable Multi-GPU GCN Training Framework

Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.