Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

This repo is built upon a local copy of transformers==2.1.1. This repo has been tested on torch==1.4.0 with python 3.7 and CUDA 10.1.

To start, create a new environment and install:

conda create -n grad2task python=3.7
conda activate grad2task
cd Grad2Task
pip install -e .

We use wandb for logging. Please set it up following this doc and specify your project name on wandb in run_meta_training.sh:

export WANDB=[YOUR PROJECT NAME]

Download the dataset and unzip it under the main folder: https://drive.google.com/file/d/1uAdgZFYv9epk6tQVQ3SwboxFpSlkC_ZW/view?usp=sharing

If need to place it somewhere else, specify its path in path.sh.

Fine-tuning Baseline

Train & Evaluation

To train/evaluate models:

bash meta_learn.sh [MODEL_NAME] [MODE] [EXP_ID]

where [MODEL_NAME] refers to model name, [MODE] is experiment model and [EXP_ID] is an optional experiment id used for mark different runs using the same model. Options for [MODEL_NAM] and MODE are listed as follow:

`[MODE]`	Description
train	Training models.
test_best	Test the model with the best validation performance.
test_latest	Test the latest checkpoint.
test	Test model without meta-training. Only applicable to the `fine-tune-baseline` model.

`[MODEL_NAME]`	Description
fine-tune-baseline	Fine-tuning BERT for each task separately.
bert-protonet-euc	ProtoNet with BERT as encoder, using Euclidean distance as distance metric.
bert-protonet-euc-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using Euclidean distance as distance metric.
bert-protonet	ProtoNet with BERT as encoder, using cosine distance as distance metric.
bert-protonet-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using cosine distance as distance metric.
bert-leopard	Leopard with pretrained BERT [1].
bert-leopard-fixlr	Leopard but with fixed learning rates.
bert-cnap-bn-euc-context-cls-shift-scale-ar	Our proposed approach using gradients as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-X	Our proposed approach using average input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XGrad	Our proposed approach using both gradients and input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XY	Our proposed approach using input and textual label encoding as task representation.
bert-cnap-bn-euc-context-shift-scale-ar	Same with our proposed approach except adapting all tokens instead of just the [CLS] token as we do.
bert-cnap-bn-pretrained-taskemb	Our proposed approach with pretrained task embedding model.
bert-cnap-bn-hyper	A hypernetwork based approach.

To run a model with different hyperparameters, first name this run by [EXP_ID] and then specify the new hyperparameters in run/meta_learn.sh. For example, if one wants to run bert-protonet-euc with a smaller learning rate, they could modify run/meta_learn.sh as:

...
elif [ $1 == "bert-protonet-bn" ]; then # ProtoNet with cosince distance
    export LEARNING_RATE=2e-5
    export CHECKPOINT_FREQ=1000
    if [ ${EXP_ID} == *"lr1e-5" ]; then
        export LEARNING_RATE=1e-5
        export CHECKPOINT_FREQ=2000
        # modify other hyperparameters here
    fi
...

and then run:

bash meta_learn.sh bert-protonet-bn train lr1e-5

Reference

[1] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
transformers		transformers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
learner.py		learner.py
leopard_data.py		leopard_data.py
logging_utils.py		logging_utils.py
main_baseline.py		main_baseline.py
main_meta_training.py		main_meta_training.py
main_taskemb.py		main_taskemb.py
meta_dataset.py		meta_dataset.py
modeling_adaptation.py		modeling_adaptation.py
modeling_baselines.py		modeling_baselines.py
modeling_bert.py		modeling_bert.py
modeling_bert_adaptation.py		modeling_bert_adaptation.py
modeling_bert_context.py		modeling_bert_context.py
modeling_cnap.py		modeling_cnap.py
modeling_leopard.py		modeling_leopard.py
modeling_mixin.py		modeling_mixin.py
modeling_protonet.py		modeling_protonet.py
modeling_task_emb.py		modeling_task_emb.py
path.sh		path.sh
run_baseline.sh		run_baseline.sh
run_meta_training.sh		run_meta_training.sh
run_taskemb.sh		run_taskemb.sh
setup.py		setup.py
task_emb_exp_config.yaml		task_emb_exp_config.yaml
utils.py		utils.py

License

jixuan-wang/Grad2Task

Folders and files

Latest commit

History

Repository files navigation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

Fine-tuning Baseline

Train & Evaluation

Reference

About

Resources

License

Stars

Watchers

Forks

Languages