Huggingface Transformers + Adapters = ❤️

Last update: Jan 09, 2023

Overview

adapter-transformers

A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models

adapter-transformers is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules.

💡 Important: This library can be used as a drop-in replacement for HuggingFace Transformers and regularly synchronizes new upstream changes. Thus, most files in this repository are direct copies from the HuggingFace Transformers source, modified only with changes required for the adapter implementations.

Installation

adapter-transformers currently supports Python 3.6+ and PyTorch 1.3.1+. After installing PyTorch, you can install adapter-transformers from PyPI ...

pip install -U adapter-transformers

... or from source by cloning the repository:

git clone https://github.com/adapter-hub/adapter-transformers.git
cd adapter-transformers
pip install .

Getting Started

HuggingFace's great documentation on getting started with Transformers can be found here. adapter-transformers is fully compatible with Transformers.

To get started with adapters, refer to these locations:

Colab notebook tutorials, a series notebooks providing an introduction to all the main concepts of (adapter-)transformers and AdapterHub
https://docs.adapterhub.ml, our documentation on training and using adapters with adapter-transformers
https://adapterhub.ml to explore available pre-trained adapter modules and share your own adapters
Examples folder of this repository containing HuggingFace's example training scripts, many adapted for training adapters

Citation

If you use this library for your work, please consider citing our paper AdapterHub: A Framework for Adapting Transformers:

@inproceedings{pfeiffer2020AdapterHub,
    title={AdapterHub: A Framework for Adapting Transformers},
    author={Pfeiffer, Jonas and
            R{\"u}ckl{\'e}, Andreas and
            Poth, Clifton and
            Kamath, Aishwarya and
            Vuli{\'c}, Ivan and
            Ruder, Sebastian and
            Cho, Kyunghyun and
            Gurevych, Iryna},
    booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
    pages={46--54},
    year={2020}
}

Comments

"Parallel" option for training? Parallel adapter outputs required (without interacting with each other).
Hello,

Thanks for this nice framework 👍 . I might be asking something that isn't yet possible but wanted to at least try asking!

I am trying to feed two BERT-based model's outputs to subsequent NN. This requires having two BERT models to be loaded, however, the memory consumption becomes too high if I load two BERT models. To remedy this, I was wondering if I could do something like "Parallel" in training time. (FYI, I am not trying to dynamically drop the first few layers and simply trying to create two BERT forward paths with lesser memory consumption)

I understand that active adapters can be switched by set_active_adapters(). (Actually, could you confirm if my understanding is correct?) But, this doesn't seem to fit my purpose as, in my case, I need both adapters to output independent representation based on respective adapters.

Is there anyways that I can make adapters not interact with each other on the forward path while not loading original BERT parameters twice?

Making this question even more complex, I also need to make one adapter's parameters to be non-differentiable while requiring them in the forward loop. Any ideas perhaps? :)

question
opened by leejayyoon 18

ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

Hi I am trying with this example colab: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=Lbwb3NRf8mBF

getting this error:

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    from transformers import AutoTokenizer, EvalPrediction, GlueDataset, GlueDataTrainingArguments, AutoModelWithHeads, AdapterType
ImportError: cannot import name 'AutoModelWithHeads' from 'transformers' (/idiap/user/rkarimi/libs/anaconda3/envs/adapter/lib/python3.7/site-packages/transformers/__init__.py)

versions

(adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep transformers
adapter-transformers      1.0.1                     <pip>
transformers              3.5.1                     <pip>
(adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep pytorch
pytorch-lightning         1.0.4                     <pip>
adapter hub from github is installed

bug

opened by rabeehkarimimahabadi 17

training the language adapters in the MAD-X paper

Hi I would need to train language adapters as done in MAD-X paper, I have downloaded wikipedia data, but these are very large-scale data and so far I did not managed to train them, I was wondering if you could share with me the script that you managed to train the language adapters, thank you very much in advance.
question

opened by dorost1234 13
Add t5 adapter

Followed the pattern of Bart to add adapters to T5. One change is that whereas Bart has separate classes for encoder and decoder, T5 does not. So I am using the is_decoder for changes between encoder and decoder classes, such as adding cross_attention adapters and adding invertible adapters.

I'm working on some testing.

opened by AmirAktify 12

Training an Adapter using own classification head and pytorch training loop

Details

Hello ! I want to add adapter approach in my text-classification pre-trained bert, but I did not find a good explanation in the documentation on how to that. My model class is the following:

class BertClassifier(nn.Module):
    """Bert Model for Classification Tasks."""
    def __init__(self, freeze_bert=True):
        """
         @param    bert: a BertModel object
         @param    classifier: a torch.nn.Module classifier
         @param    freeze_bert (bool): Set `False` to fine-tune the BERT model
        """
        super(BertClassifier, self).__init__()

        # Instantiate BERT model
        # Specify hidden size of BERT, hidden size of our classifier, and number of labels
        self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
        self.D_in = 1024 
        self.H = 512
        self.D_out = 2
        

        # Add a new adapter
        self.bert.add_adapter("thermo_cl",set_active=True)
        self.bert.train_adapter(["thermo_cl"])

 
        # Instantiate the classifier head with some one-layer feed-forward classifier
        self.classifier = nn.Sequential(
            nn.Linear(self.D_in, 512),
            nn.Tanh(),
            nn.Linear(512, self.D_out),
            nn.Tanh()
        )
 
         # Freeze the BERT model
        if freeze_bert:
            for param in self.bert.parameters():
                param.requires_grad = True


    def forward(self, input_ids, attention_mask):
        ''' Feed input to BERT and the classifier to compute logits.
         @param    input_ids (torch.Tensor): an input tensor with shape (batch_size,
                       max_length)
         @param    attention_mask (torch.Tensor): a tensor that hold attention mask
                       information with shape (batch_size, max_length)
         @return   logits (torch.Tensor): an output tensor with shape (batch_size,
                       num_labels) '''
         # Feed input to BERT
        outputs = self.bert(input_ids=input_ids,
                             attention_mask=attention_mask)
         
         # Extract the last hidden state of the token `[CLS]` for classification task
        last_hidden_state_cls = outputs[0][:, 0, :]
 
         # Feed input to classifier to compute logits
        logits = self.classifier(last_hidden_state_cls)
 
        return logits

The training loop is the following:

def initialize_model(epochs):
    """ Initialize the Bert Classifier, the optimizer and the learning rate scheduler."""
    # Instantiate Bert Classifier
    bert_classifier = BertClassifier(freeze_bert=False) #false=freezed

    # Tell PyTorch to run the model on GPU
    bert_classifier = bert_classifier.to(device)

    # Create the optimizer
    optimizer = AdamW(bert_classifier.parameters(),
                      lr=lr,    # Default learning rate
                      eps=1e-8    # Default epsilon value
                      )

    # Total number of training steps
    total_steps = len(train_dataloader) * epochs

    # Set up the learning rate scheduler
    scheduler = get_linear_schedule_with_warmup(optimizer,
                                                num_warmup_steps=0, # Default value
                                                num_training_steps=total_steps)

    return bert_classifier, optimizer, scheduler

def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):

    """Train the BertClassifier model."""
    # Start training loop
    logging.info("--Start training...\n")

    # Initialize tracker for minimum validation loss
    valid_loss_min = valid_loss_min_input 


    for epoch_i in range(start_epochs, epochs):

                          ..............................

     if evaluation == True:
            # After the completion of each training epoch, measure the model's performance
            # on our validation set.
            val_loss, val_accuracy = evaluate(model, val_dataloader)

            # Print performance over the entire training data
            time_elapsed = time.time() - t0_epoch
            
            logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")

            logging.info("-"*70)
        logging.info("\n")

         # create checkpoint variable and add important data
        checkpoint = {
            'epoch': epoch_i + 1,
            'valid_loss_min': val_loss,
            'state_dict': model.state_dict(),
            'optimizer': optimizer.state_dict(),
        }
        
        # save checkpoint
        save_ckp(checkpoint, False, checkpoint_path, best_model_path)
        
        ## TODO: save the model if validation loss has decreased
        if val_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,val_loss))
            # save checkpoint as best model
            save_ckp(checkpoint, True, checkpoint_path, best_model_path)
            valid_loss_min = val_loss


    model.save_adapter("./final_adapter", "thermo_cl")
    logging.info("-----------------Training complete--------------------------")

bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
train(model = bert_classifier....)

As you can see I have my own personalized classification head, so I don't want to use the .add_classification_head() method. Is it correct to train and activate the adapter in this way? I would like to know if I'm using adapter properly and also how to save the checkpoint and my model weights because at the end of the training (where i suppose to save the adapter) I receive this error:

AttributeError: 'BertClassifier' object has no attribute 'save_adapter'

Thanks for the help!

question Stale

opened by Ch-rode 11

Merge with original transformers library

🚀 Feature request

Merge this into the original transformers library.

Motivation

This library is awesome so thanks a lot but it would be much more convenient to have this merged into the original transformers library. The Huggingface team seems to be focused on adding lightweight options for their models and adapters are huge time-and-memory-savers for multitask use cases and would be a great addition to the transformers library.

Your contribution

You've done the integration here already so it should be straightforward but happy to help. I've posted an issue on huggingface's end as well.
discussion Stale

opened by salimmj 11
Unintuitive slowdown in data loading and model updating on using adapters
Environment info

transformers version: 1.0.1

Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.10

Python version: 3.8.5

PyTorch version (GPU?): 1.7.0 (True)

Tensorflow version (GPU?): not installed (NA)

Using GPU in script?: Yes

Using distributed or parallel set-up in script?: Yes

Who can help: @LysandreJik @patrickvonplaten

Model I am using: Bert

Language I am using the model on:English

Adapter setup I am using (if any): HoulsbyConfig

The problem arises when using: My own modified scripts: I want to use adapters for a project of mine, which will require fine-tuning BERT multiple times. In order to get an understanding of how much speedup I shall get from using adapters, I profiled the various steps in the training loop of BERT, both with and without the use of adapters The tasks I am working on is: Stanford Natural Language inference(SNLI)

To reproduce

Steps to reproduce the behavior: The following function is executed for a period of 4 hours on identical GPUs(via an LSF bach system) once with UseAdapter set to true and once with it set to False. The path contains a preloaded and tokenized version of the SNLI training set(as well as the test and dev sets, dropped here via underscores)

def load_and_train(path, UseAdapter): x_train,y_train,a_train,t_train,_,_,_,_,_,_,_,_=load(open(path,"rb")) train_inst=torch.tensor(x_train) train_att=torch.tensor(a_train) train_types=torch.tensor(t_train) train_targ=torch.tensor(y_train) train_data = TensorDataset(train_inst, train_att, train_types,train_targ) train_sampler = RandomSampler(train_data) train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=32) model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3) if UseAdapter: model.add_adapter("SNLI",AdapterType.text_task,HoulsbyConfig().__dict__) model.train_adapter(["SNLI"]) model.set_active_adapters(["SNLI"]) model.cuda() optimizer=AdamW(model.parameters(),lr=1e-4) scheduler=get_linear_schedule_with_warmup(optimizer,0,len(train_dataloader)*EPOCHS) iter=0 time_load=0 time_cler=0 time_forw=0 time_back=0 time_updt=0 for e in range(15): model.train() for batch in train_dataloader: last=time() x=batch[0].cuda() a=batch[1].cuda() t=batch[2].cuda() y=batch[3].cuda() time_load+=time()-last last=time() model.zero_grad() time_cler+=time()-last last=time() outputs = model(x, token_type_ids=t, attention_mask=a, labels=y) time_forw+=time()-last last=time() loss=outputs[0] loss.backward() time_back+=time()-last last=time() optimizer.step() scheduler.step() time_updt+=time()-last iter+=1 print(time_load,time_cler,time_forw,time_back,time_updt)

Expected behavior

With Adapters the trainer is able to run through more batches than without by the time the job gets timed out

Per Batch time_load is identical for both cases

Per Batch time_cler is slightly lower with adapters due to the presence of fewer gradients

Per Batch time_forw is slightly higher with adapters due to extra layers that are introduced

Per Batch time_back is significantly lower with adapters since it needs to save fewer gradients

Per Batch time_updt is lower with adapters due to having fewer parameters to update

Observed Behaviour

Overall times(seconds):

Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update | Total | No of Batches -- | -- | -- | -- | -- | -- | -- | -- No | 9.141064644 | 349.405822 | 873.8870151 | 11770.82554 | 1159.772 | 14163.03 | 69022 Yes | 2721.683394 | 394.4980106 | 1652.686945 | 3192.402303 | 6304.335 | 14265.61 | 95981

Per Batch Times(seconds):

Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update -- | -- | -- | -- | -- | -- No | 0.000132437 | 0.005062238 | 0.012660992 | 0.1705373 | 0.016803 Yes | 0.028356481 | 0.004110168 | 0.017218897 | 0.033260774 | 0.065683

As is evident from above, points 2 and 6 above are not satisfied in this output. Note that similar observations were made in 2 reruns of the experiment. It is unclear to me if there is an explanation I am missing or if this is an implementation issue.
bug
opened by cs1160701 9
Loading custom adapters and 'output_attentions' for AdapterFusion
Question

Information

Model I am using (Bert, XLNet ...): XLM-RoBERTa-base

Language I am using the model on (English, Chinese ...): Korean

Adapter setup I am using (if any):

The problem arises when using:

[X] the official example scripts: (give details below)

[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)

[X] my own task or dataset: (give details below)

Datasets: KorNLI and KorSTS (Machine translated Korean MNLI & STS-B dataset)

Its format and size are the same as the original datasets (MNLI & STS-B)

Background

What I'm doing is that:

train Task-Adapters for KorNLI and KorSTS on the XLM-RoBERTa-base model (to train on Korean datasets) using the official code, 'run_glue_alt.py'

fusion both adapters with a fusion layer using 'run_fusion_glue.py'

Questions

Sorry that I'm not familiar with the adapter-transformers codebase. Here are some questions about the AdapterFusion framework.

Is it available to load my own pre-trained adapters using 'model.load_adapter' function in the current framework? (I'm using the latest version of adapter-transformers')

The performance on the target task (KorSTS) composed with KorSTS and KorNLI single task adapters is markedly lower than the single task adapter trained on the KorSTS dataset. Even with various hyperparameter (batch size, epoch, learning rate, fusion config, ...) search, the performance doesn't seem to be improved. Is there any way to check whether the fusion layer is trained properly?

Connected with the questions above, is it possible to investigate the attention distribution of the trained fusion layer? I've checked there is an option 'output_attentions' defined in the BertModel class, but I could not find a way to output attention weights of the fusion layers, not the self-attention layers of the original pre-trained model.

Environment info

transformers version:

Platform:

Python version: 3.6.3

PyTorch version (GPU?): 1.4

Tensorflow version (GPU?):

Using GPU in script?: Yes

Using distributed or parallel set-up in script?: No, I'm using a single GPU

bug question
opened by bigkunzi 9

TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

Environment info

adapter-transformers version:
Platform: Linux
Python version: 3.6.8
PyTorch version (GPU?): GPU / 1.7
Tensorflow version (GPU?): NA
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Using nn.DataParallel

Information

Model I am using (Bert, XLNet ...): BERT pretrained model with 3 custom adapters + heads are used.

Language I am using the model on (English, Chinese ...): EN

Adapter setup I am using (if any): 3 Adapters (with default configuration) and 3 Classification Head.

The problem arises when using:

[ ] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is: Multi-task finetuning using AdapterHub

Error below :

 (from logs) active head : [<bound method AdapterCompositionBlock.last of Stack[combined, resource_type, action]>]

Traceback (most recent call last):
  File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 1092, in forward
    head_inputs, head_name=head, attention_mask=attention_mask, return_dict=return_dict, **kwargs
  File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/adapters/heads.py", line 509, in forward_head
    if head not in self.heads:
  File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 304, in __contains__
    return key in self._modules
TypeError: unhashable type: 'Stack'

Modified code below


model = AutoModelWithHeads.from_pretrained('bert_base_uncased')

# 3 adapters and classification heads are added.
model.add_adapter('name_a')
model.add_classification_head('name_a',  {'num_labels' : 100})

model.add_adapter('name_b')
model.add_classification_head('name_b')

model.add_adapter('name_c')
model.add_classification_head('name_c',  {'num_labels' : 5})


# Use `Parallel` to enable multiple active heads.
adapter_names  = ['name_a', 'name_b', 'name_c']
model.active_heads =  ac.Parallel(adapter_names)

for name in adapter_names:
    model.train_adapter(name)
    
# Invoke forward pass. This will trigger the error. 
model(inputs)

Expected behavior

Model forward pass should work.

bug

opened by hchoi-moveworks 8

Hinglish Sentiment Adapter
🌟 New Adapter setup

Model and Data Description

Hinglish: Romanized version of Hindi, and is immensely popular in India, where Hindi is spoken by millions of people but typed quite often in Roman script

Dataset: SemEval 2020 Task 9 Sentiment Analysis: 3 classes, +ve, -ve and neutral

Open source status

[x] Code Implementation for the Adapter: https://colab.research.google.com/drive/19lofRd9n142xJCtUteZb5L_r7spGcGLL?usp=sharing

[x] Past Work: Accepted Paper, Code and Model Weights

[x] Who are the authors: @NirantK and @meghanabhange

What I need help with

[x] Because there were no examples other than Glue Datasets, I ended up implementing a new HinglishDataset class and other skeleton code -- I'd appreciate a review if I got something wrong

Next Steps

If all is well in the code above, I'd like to continue along and contribute an adapter for Hinglish under the Sentiment task.
enhancement
opened by NirantK 8
Train adapters without Hugging Face Trainer scripts

Hi, I was looking into example scripts for Adapter-Hub and almost all *_no_trainer.py scripts were not using adapters at all. Are you guys planning to add those scripts soon? I can also help in porting trainer scripts to no_trainer scripts if someone can guide me about what all changes will be required for that. Thank you!

cc: @calpt
question Stale

opened by bhavitvyamalik 7

T5: Missing tied weights crash `accelerate`

First opened at https://github.com/huggingface/accelerate/issues/958 . When huggingface accelerate is used via device_map='auto', there is a weight tied with the missing lm_head that stimulates a crash inside the device map planning code. It would be nice if there were a clear way to retain the head and tied weight during loading.

Environment info

adapter-transformers version: 3.1.0
Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
Python version: 3.9.16+
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes, device_map='auto'
Using distributed or parallel set-up in script?: no

Information

Model I am using (Bert, XLNet ...): google/flan-t5-base

Language I am using the model on (English, Chinese ...): n/a

Adapter setup I am using (if any): AutoAdapterModel.from_pretrained

The problem arises when using:

[ ] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

import transformers
model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map='auto')

Result:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/user/scratch/test-2023-01-07.py:2 in <module>                                              │
│                                                                                                  │
│   1 import transformers                                                                          │
│ ❱ 2 model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map=     │
│   3                                                                                              │
│                                                                                                  │
│ /home/user/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:446 in    │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   443 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
│   444 │   │   elif type(config) in cls._model_mapping.keys():                                    │
│   445 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
│ ❱ 446 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
│   447 │   │   raise ValueError(                                                                  │
│   448 │   │   │   f"Unrecognized configuration class {config.__class__} for this kind of AutoM   │
│   449 │   │   │   f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapp   │
│                                                                                                  │
│ /home/user/.local/lib/python3.9/site-packages/transformers/modeling_utils.py:2121 in             │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   2118 │   │   │   no_split_modules = model._no_split_modules                                    │
│   2119 │   │   │   # Make sure tied weights are tied before creating the device map.             │
│   2120 │   │   │   model.tie_weights()                                                           │
│ ❱ 2121 │   │   │   device_map = infer_auto_device_map(                                           │
│   2122 │   │   │   │   model, no_split_module_classes=no_split_modules, dtype=torch_dtype, max_  │
│   2123 │   │   │   )                                                                             │
│   2124                                                                                           │
│                                                                                                  │
│ /shared/src/accelerate/src/accelerate/utils/modeling.py:545 in infer_auto_device_map             │
│                                                                                                  │
│   542 │   │   elif tied_param is not None:                                                       │
│   543 │   │   │   # Determine the sized occupied by this module + the module containing the ti   │
│   544 │   │   │   tied_module_size = module_size                                                 │
│ ❱ 545 │   │   │   tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in    │
│   546 │   │   │   tied_module_name, tied_module = modules_to_treat[tied_module_index]            │
│   547 │   │   │   tied_module_size += module_sizes[tied_module_name] - module_sizes[tied_param   │
│   548 │   │   │   if current_max_size is not None and current_memory_used + tied_module_size >   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

Expected behavior

No crash. Ability to tie weights with seq2seq lm_head.

bug

opened by xloem 0

Fusing task-specific and task-agnostic adapters
Environment info

adapter-transformers version: 3.1.0

Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17

Python version: 3.8.11

Huggingface_hub version: 0.11.1

PyTorch version (GPU?): 1.12.1 (False)

Tensorflow version (GPU?): not installed (NA)

Flax version (CPU?/GPU?/TPU?): not installed (NA)

Jax version: not installed

JaxLib version: not installed

Using GPU in script?: yes

Using distributed or parallel set-up in script?: no

Details

Hi, I am trying to combine task-specific and task-agnostic adapters. Assume I have three tasks Task-A, Task-B, and, Task-C. I will add task-specific adapters and task-agnostic adapters as follows

import transformers.adapters.composition as ac model.add_adapter("TASK-A") model.add_adapter("TASK-B") model.add_adapter("TASK-C") model.add_adapter("TASK-Agnostic")

Now I want to fuse the task-specific adapter and task-agnostic adapter dynamically i.e, depending on what the task is.

Should I fuse the adapters as follows?

model.add_adapter_fusion(["TASK-A", "TASK-Agnostic"]) model.add_adapter_fusion(["TASK-B", "TASK-Agnostic"]) model.add_adapter_fusion(["TASK-C", "TASK-Agnostic"])

Inside the forward_pass of Trainer, I will set the active adapters as follows

task_name = get_task_name() model.active_adapters = ac.Fuse(task_name, "TASK-Agnostic")

Is this the right way to implement this?

Thanks
question
opened by murthyrudra 0
Stacking two parallel composition blocks

Hi,

Can I stack two Parallel composition blocks like this? ac.Stack(ac.Parallel('a', 'b'), ac.Parapllel('c', 'd'))

I found that the inputs will only be replicated once, but should be twice. Could you help me fix it?

Thanks!
question

opened by HZQ950419 0
Add adapter to AutoModelForSequenceClassification model
Environment info

adapter-transformers version: newest

Platform: Azure ML

Python version: 3.8

PyTorch version (GPU?):

Details

I try to use AutoModelForSequenceClassification model (using BART). The document is not so clear so I just load it directly and add adapter(LoRA) to it. When I run the trainer, I got the following errors

RestException: INVALID_PARAMETER_VALUE: Response: {'Error': {'Code': 'ValidationError', 'Severity': None, 'Message': 'No more than 255 characters per params Value. Request contains 1 of greater length.', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': '04d45ce3752c5e51c54e71f3950411ca', 'request': '6d216d8faea19d26'}, 'Environment': 'westus', 'Location': 'westus', 'Time': '2023-01-04T17:45:03.5650777+00:00', 'ComponentName': 'mlflow', 'error_code': 'INVALID_PARAMETER_VALUE'}

Any ideas on how to solve it?
question
opened by andyzengmath 0
Support for openai Whisper
🌟 New adapter setup

Support for openai Whisper

Add adapter integration for whisper.

Open source status

[x] the model implementation is available: official code hf

[x] the model weights are available: hf

[x] who are the authors: @jongwook @ArthurZucker @sgugger

enhancement
opened by karynaur 0
Add adapter configuration strings & restructure adapter method docs
Configuration strings

This PR adds the possibility to use flexible adapter configuration strings which allow specifying custom config attributes. Examples:

Set config attributes: model.add_adapter("name", config="parallel[reduction_factor=2]")

Config union model.add_adapter("name", config="prefix_tuning|parallel")

more examples: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/tests_adapters/test_adapter_config.py#L95-L102

Documentation: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/adapter_docs/overview.md

Configuration strings can allow passing complex configurations e.g. via command line.

Documentation restructuring

The adapter method documentation is now split into three pages:

Overview and Configuration: introduction, table, configuration

Adapter Methods

Method Combinations
opened by calpt 0

Releases(adapters3.1.0)

adapters3.1.0(Sep 15, 2022)
Based on transformers v4.21.3

New

New adapter methods

Add LoRA implementation (@calpt via #334, #399): Documentation

Add (IA)^3 implementation (@calpt via #396): Documentation

Add UniPELT implementation (@calpt via #407): Documentation

New model integrations

Add Deberta and DebertaV2 integration(@hSterz via #340)

Add Vision Transformer integration (@calpt via #363)

Misc

Add adapter_summary() method (@calpt via #371): More info

Return AdapterFusion attentions using output_adapter_fusion_attentions argument (@calpt via #417): Documentation

Changed

Upgrade of underlying transformers version (@calpt via #344, #368, #404)

Fixed

Infer label names for training for flex head models (@calpt via #367)

Ensure root dir exists when saving all adapters/heads/fusions (@calpt via #375)

Avoid attempting to set prediction head if non-existent (@calpt via #377)

Fix T5EncoderModel adapter integration (@calpt via #376)

Fix loading adapters together with full model (@calpt via #378)

Multi-gpu support for prefix-tuning (@alexanderhanboli via #359)

Fix issues with embedding training (@calpt via #386)

Fix initialization of added embeddings (@calpt via #402)

Fix model serialization using torch.save() & torch.load() (@calpt via #406)

Source code(tar.gz)
Source code(zip)
adapters3.0.1(May 18, 2022)
Based on transformers v4.17.0

New

Support float reduction factors in bottleneck adapter configs (@calpt via #339)

Fixed

[AdapterTrainer] add missing preprocess_logits_for_metrics argument (@stefan-it via #317)

Fix save_all_adapters such that with_head is not ignored (@hSterz via #325)

Fix inferring batch size for prefix tuning (@calpt via #335)

Fix bug when using compacters with AdapterSetup context (@calpt via #328)

[Trainer] Fix issue with AdapterFusion and load_best_model_at_end (@calpt via #341)

Fix generation with GPT-2, T5 and Prefix Tuning (@calpt via #343)

Source code(tar.gz)
Source code(zip)
adapters3.0.0(Mar 23, 2022)
Based on transformers v4.17.0

New

Efficient Fine-Tuning Methods

Add Prefix Tuning (@calpt via #292)

Add Parallel adapters & Mix-and-Match adapter (@calpt via #292)

Add Compacter (@hSterz via #297)

Misc

Introduce XAdapterModel classes as central & recommended model classes (@calpt via #289)

Introduce ConfigUnion class for flexible combination of adapter configs (@calpt via #292)

Add AdapterSetup context manager to replace adapter_names parameter (@calpt via #257)

Add ForwardContext to wrap model forward pass with adapters (@calpt via #267, #295)

Search all remote sources when passing source=None (new default) to load_adapter() (@calpt via #309)

Changed

Deprecate XModelWithHeads in favor of XAdapterModel (@calpt via #289)

Refactored adapter integration into model classes and model configs (@calpt via #263, #304)

Rename activation functions to match Transformers' names (@hSterz via #298)

Upgrade of underlying transformers version (@calpt via #311)

Fixed

Fix seq2seq generation with flexible heads classes (@calpt via #275, @hSterz via #285)

Parallel composition for XLM-Roberta (@calpt via #305)

Source code(tar.gz)
Source code(zip)
adapters2.3.0(Feb 9, 2022)
Based on transformers v4.12.5

New

Allow adding, loading & training of model embeddings (@hSterz via #245). See https://docs.adapterhub.ml/embeddings.html.

Changed

Unify built-in & custom head implementation (@hSterz via #252)

Upgrade of underlying transformers version (@calpt via #255)

Fixed

Fix documentation and consistency issues for AdapterFusion methods (@calpt via #259)

Fix serialization/ deserialization issues with custom adapter config classes (@calpt via #253)

Source code(tar.gz)
Source code(zip)
adapters2.2.0(Oct 14, 2021)
Based on transformers v4.11.3

New

Model support

T5 adapter implementation (@AmirAktify & @hSterz via #182)

EncoderDecoderModel adapter implementation (@calpt via #222)

Prediction heads

AutoModelWithHeads prediction heads for language modeling (@calpt via #210)

AutoModelWithHeads prediction head & training example for dependency parsing (@calpt via #208)

Training

Add a new AdapterTrainer for training adapters (@hSterz via #218, #241 )

Enable training of Parallel block (@hSterz via #226)

Misc

Add get_adapter_info() method (@calpt via #220)

Add set_active argument to add & load adapter/fusion/head methods (@calpt via #214)

Minor improvements for adapter card creation for HF Hub upload (@calpt via #225)

Changed

Upgrade of underlying transformers version (@calpt via #232, #234, #239 )

Allow multiple AdapterFusion configs per model; remove set_adapter_fusion_config() (@calpt via #216)

Fixed

Incorrect referencing between adapter layer and layer norm for DataParallel (@calpt via #228)

Source code(tar.gz)
Source code(zip)
adapters2.1.0(Jul 8, 2021)
Based on transformers v4.8.2

New

Integration into HuggingFace's Model Hub

Add support for loading adapters from HuggingFace Model Hub (@calpt via #162)

Add method to push adapters to HuggingFace Model Hub (@calpt via #197)

Learn more

BatchSplit adapter composition

BatchSplit composition block for adapters and heads (@hSterz via #177)

Learn more

Various new features

Add automatic conversion of static heads when loaded via XModelWithHeads (@calpt via #181) Learn more

Add list_adapters() method to search for adapters (@calpt via #193) Learn more

Add delete_adapter(), delete_adapter_fusion() and delete_head() methods (@calpt via #189)

MAD-X 2.0 WikiAnn NER notebook (@hSterz via #187)

Upgrade of underlying transformers version (@hSterz via #183, @calpt via #194 & #200)

Changed

Deprecate add_fusion() and train_fusion() in favor of add_adapter_fusion() and train_adapter_fusion() (@calpt via #190)

Fixed

Suppress no-adapter warning when adapter_names is given (@calpt via #186)

leave_out in load_adapter() when loading language adapters from Hub (@hSterz via #177)

Source code(tar.gz)
Source code(zip)
adapters2.0.1(May 28, 2021)
Based on transformers v4.5.1

New

Allow different reduction factors for different adapter layers (@hSterz via #161)

Allow dynamic dropping of adapter layers in load_adapter() (@calpt via #172)

Add method get_adapter() to retrieve weights of an adapter (@hSterz via #169)

Changed

Re-add adapter_names argument to model forward() methods (@calpt via #176)

Fixed

Fix resolving of adapter from Hub when multiple options available (@Aaronsom via #164)

Fix & improve adapter saving/ loading using Trainer class (@calpt via #178)

Source code(tar.gz)
Source code(zip)
adapters2.0.0(Apr 29, 2021)
Based on transformers v4.5.1

All major new features & changes are described at https://docs.adapterhub.ml/v2_transition.

all changes merged via #105

Additional changes & Fixes

Support loading adapters with load_best_model_at_end in Trainer (@calpt via #122)

Add setter for active_adapters property (@calpt via #132)

New notebooks for NER, text generation & AdapterDrop (@hSterz via #135)

Enable trainer to load adapters from checkpoints (@hSterz via #138)

Update & clean up example scripts (@hSterz via #154 & @calpt via #141, #155)

Add unfreeze_adapters param to train_fusion() (@calpt via #156)

Ensure eval/ train mode is correct for AdapterFusion (@calpt via #157)

Source code(tar.gz)
Source code(zip)
adapters1.1.1(Jan 14, 2021)
Based on transformers v3.5.1

New

Modular & custom prediction heads for flex head models (@hSterz via #88)

Fixed

Fixes for DistilBERT layer norm and AdapterFusion (@calpt via #102)

Fix for reloading full models with AdapterFusion (@calpt via #110)

Fix attention and logits output for flex head models (@calpt via #103 & #111)

Fix loss output of flex model with QA head (@hSterz via #88)

Source code(tar.gz)
Source code(zip)
adapters1.1.0(Nov 30, 2020)
Based on transformers v3.5.1

New

New model with adapter support: DistilBERT (@calpt via #67)

Save label->id mapping of the task together with the adapter prediction head (@hSterz via #75)

Automatically set matching label->id mapping together with active prediction head (@hSterz via #81)

Upgraded underlying transformers version (@calpt via #55, #72 and #85)

Colab notebook tutorials showcasing all AdapterHub concepts (@calpt via #89)

Fixed

Support for models with flexible heads in pipelines (@calpt via #80)

Adapt input to models with flexible heads to static prediction heads input (@calpt via #90)

Source code(tar.gz)
Source code(zip)
adapters1.0.1(Oct 6, 2020)
Based on transformers v2.11.0

New

Adds squad-style QA prediction head to flex-head models

Bug fixes

Fixes loading and saving of adapter config in model.save_pretrained()

Fixes parsing of adapter names in fusion setup

Source code(tar.gz)
Source code(zip)
adapters1.0(Sep 9, 2020)
Based on transformers v2.11.0

our first release 🎉

Source code(tar.gz)
Source code(zip)

Owner

AdapterHub

GitHub Repository https://docs.adapterhub.ml

Huggingface Transformers + Adapters = ❤️

Related tags

Overview

adapter-transformers

A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models

Installation

Getting Started

Citation

Comments

Details

🚀 Feature request

Motivation

Your contribution

Environment info

To reproduce

Expected behavior

Observed Behaviour

Question

Information

Background

Questions

Environment info

Environment info

Information

Expected behavior

🌟 New Adapter setup

Model and Data Description

Open source status

What I need help with

Next Steps

Environment info

Information

To reproduce

Expected behavior

Environment info

Details

Environment info

Details

🌟 New adapter setup

Support for openai Whisper

Open source status

Configuration strings

Documentation restructuring

Releases(adapters3.1.0)

adapters3.1.0(Sep 15, 2022)

New

New adapter methods

New model integrations

Misc

Changed

Fixed

adapters3.0.1(May 18, 2022)

New

Fixed

adapters3.0.0(Mar 23, 2022)

New

Efficient Fine-Tuning Methods

Misc

Changed

Fixed

adapters2.3.0(Feb 9, 2022)

New

Changed

Fixed

adapters2.2.0(Oct 14, 2021)

New

Model support

Prediction heads

Training

Misc

Changed

Fixed

adapters2.1.0(Jul 8, 2021)

New

Integration into HuggingFace's Model Hub

BatchSplit adapter composition

Various new features

Changed

Fixed

adapters2.0.1(May 28, 2021)

`BatchSplit` adapter composition