An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

Related tags

Deep LearningMetaICL
Overview

MetaICL: Learning to Learn In Context

This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi.

Check out our demo at qa.cs.washington.edu:2021!

This README is mainly for how to reproduce MetaICL and Channel MetaICL in the paper, but also describe how to reproduce our baselines, including Multi-task zero-shot and various raw LM methods. All methods used in the paper are available in this repo (please see the below table).

For any questions about the paper or the code, please contact the first author (email) or leave issues.

If you find our code or paper useful, please cite the paper:

@article{ min2021metaicl,
    title={ Meta{ICL}: Learning to Learn In Context },
    author={ Min, Sewon and Lewis, Mike and Zettlemoyer, Luke and Hajishirzi, Hannaneh },
    journal={ arXiv preprint },
    year={ 2021 }
}

Content

  1. Installation
  2. Quick Start
  3. Data
  4. Training
  5. Inference
  6. Downloading Checkpoints

Installation

These are installation guidelines mainly for running baselines. Requirements for data are provided here. All codes are tested with Python 3.8.

pip install torch==1.9.0
pip install git+https://github.com/huggingface/[email protected]

To train the model, we use an 8-bit optimizer and mixed precision that significantly save the memory. To use them, please use the following commands (but skip if you will run inference only using released checkpoints):

# For 8-bit optimization: see https://github.com/facebookresearch/bitsandbytes for more details
pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda102 # modify based on your CUDA version

# For mixed precision training: see https://github.com/NVIDIA/apex for more details
# make sure your nvcc is working (e.g. `nvcc --version`)
cd .. # move outside of this project directory
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ../MetaICL # come back to this project directory

Quick Start

This is an example with a dataset financial_phrasebank.

First, prepare a list of training examples

train_data = [{"input": INPUT_1, "output": OUTPUT_1},
              {"input": INPUT_2, "output": OUTPUT_2},
              ...
              {"input": INPUT_K, "output": OUTPUT_K}]

If you prefer, you can download our training data by running the command python -m utils.download_data --demo_data then loading the downloaded file as follows.

with open("data/financial_phrasebank/financial_phrasebank_16_100_train.jsonl", "r") as f:
    train_data = []
    for line in f:
        train_data.append(json.loads(line))

Then, you can use our model as follows.

from metaicl.data import MetaICLData
from metaicl.model import MetaICLModel

# Load the model
data = MetaICLData(method="channel", max_length=1024, max_length_per_example=256)
model = MetaICLModel()
model.load("channel-metaicl")
model.cuda()
model.eval()

# Make a prediction for `input1`
input1 = "Both operating profit and net sales for the six-month period increased as compared to the corresponding period in 2007."
data.tensorize(train_data, [input1], options=["positive", "neutral", "negative"])
prediction = model.do_predict(data)[0]
print (prediction) # positive

# Make another prediction for `input2`
input2 = "The deal will have no significant effect on the acquiring company's equity ratio."
data.tensorize(train_data, [input2], options=["positive", "neutral", "negative"])
prediction = model.do_predict(data)[0]
print (prediction) # neutral

Data

As described in the paper, we use a collection of 142 tasks taken from CrossFit and UnifiedQA. We experiment with seven different settings, where there is no overlap in meta-training and target tasks. Download/Preprocessing guidelines are here.

Setting name alias (for command) # meta-train tasks # meta-train examples # target tasks
High Resource → Low Resource hr_to_lr 61 819,200 26
Classification → Classification class_to_class 43 384,022 20
Non-Classification → Classification non_class_to_class 37 368,768 20
QA → QA qa_to_qa 37 486,143 22
Non-QA → QA non_qa_to_qa 33 521,342 22
Non-NLI → NLI non_nli_to_nli 55 463,579 8
Non-Paraphrase Detection → Paraphrase Detection non_paraphrase_to_paraphrase 59 496,106 4

To run experiments for each setting, use "alias (for command)" for commands in the Training section and the Inference section.

All settings above do not use any templates/instructions. If you want to use instruction version as in ablations in the paper, use settings in the following table.

Setting name alias (for command) # instructions / meta-train task # meta-train tasks # meta-train examples # target tasks
High Resource → Low Resource without instructions hr_to_lr_noinst 0 32 492,655 12
High Resource → Low Resource with instructions (1 per task) hr_to_lr_inst 1 32 492,655 12
High Resource → Low Resource with instructions (all) hr_to_lr_inst_all 8.3 32 492,655 12

If you use these data resources, please make sure to cite CrossFit and UnifiedQA.

@inproceedings{ ye2021crossfit,
    title={ {C}ross{F}it: A Few-shot Learning Challenge for Cross-task Generalization in NLP },
    author={ Ye, Qinyuan and Lin, Bill Yuchen and Ren, Xiang },
    booktitle={ EMNLP },
    year={ 2021 }
}
@inproceedings{ khashabi2020unifiedqa,
    title={ {U}nified{QA}: Crossing Format Boundaries With a Single QA System },
    author={ Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh },
    booktitle={ Findings of EMNLP },
    year={ 2020 }
}

If you use the instruction version, please make sure to cite the T0 paper.

@article{ sanh2021multitask,
    title={ Multitask Prompted Training Enables Zero-Shot Task Generalization },
    author={ Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush },
    journal={ arXiv preprint arXiv:2110.08207 },
    year={ 2021 }
}

How to Download and Preprocess

The code is modified from the original CrossFit repo. First, install requirements:

pip install datasets==1.4.0 wget

Warning: we found that datasets==1.4.0 is not compatible with Transformers version we use for training and inference. Please use a separate environement for data preprocessing and model training/inference.

cd preprocess
# preprocess from crossfit
python _build_gym.py --build --n_proc=40 --do_test
python _build_gym.py --build --n_proc=40 --do_train # skip if you won't run training yourself
# preprocess from unifiedqa
python unifiedqa.py --do_train --do_test # skip `--do_train` if you won't run training yourself

By default, preprocessed data is saved at data/.

Process instruction version

The instruction version is for settings using instructions. We use instructions from BigScience PromptSource. First, fetch instructions (prompts) from PromptSource by doing the following.

# assuming you are still inside `preprocess` directory
cd ../.. # go outside of your project directory
git clone https://github.com/bigscience-workshop/promptsource.git
cd promptsource
git checkout 4e67a38d9642bde222cb90e36e8a66fd6e4a861a
mv promptsource ../MetaICL/preprocess/ # move promptsource directory under `preprocess` directory
cd ../MetaICL/preprocess # comte back to `preprocess` directory
pip install pandas jinja2 "pyyaml>=5"

Note that this is a workaround that does not use python-pip to install the promptsource packages because it requires to use python<=3.7, while all other codes in this repo use python 3.8. If promptsource starts supporting python 3.8, please install the package following the guidelines in the original repo.

Then, download the data via:

python _build_gym.py --build --n_proc=20 --do_test --inst
python _build_gym.py --build --n_proc=20 --do_train --inst # skip if you won't run training yourself

Training

First, run the command to tensorize the text data and save them.

python train.py \
  --task $task --k 16384 --test_k 16 --seed 100 --use_demonstrations --method channel \
  --do_tensorize --n_gpu 8 --n_process 40
  • --task: name of the setting, like hr_to_lr, class_to_class, non_class_to_class, etc
  • --k: # of examples per meta-training task
  • --test_k: # of examples to be used at inference
  • --seed: data seed for training data
  • --method: direct / channel
  • --n_gpu: the number of gpus you will use for training
  • --n_process: the number of processed for preprocessing

Then, run the following command to train the model.

python -m torch.distributed.launch --nproc_per_node=8 train.py \
  --task $task --k 16384 --test_k 16 --seed 100 --train_seed 1 --use_demonstrations --method channel --n_gpu 8 \
  --batch_size 1 --lr 1e-05 --fp16 --optimization 8bit-adam --out_dir checkpoints/channel-metaicl/$task
  • --fp16: for mixed precision training
  • --optimization 8bit-adam: for 8-bit approximations for Adam optimizer
  • --batch_size: batch size per GPU; we use 1, so that the global batch size is 8
  • --num_training_steps: number of training steps; 30000 by default
  • --log_file: you can optionally specify this to save logs as a text file

Training takes around 4.5 hours

If you want to train Multi-task zero-shot model that is one of our baselines in the paper, you can use similar commands for both tensorizing and training, but without --use_demonstrations and --test_k. Training takes around 3 hours.

Inference

python test.py --task $task --k 16 --split test --seed 100 --test_batch_size 16 \
    --method {channel|direct} --use_demonstrations \
    --out_dir checkpoints/metaicl/$task \
    --global_step 30000

Instead of specifying --global_step, you can specify --checkpoint for path to the checkpoint if you want to use checkpoint stored in somewhere else (for example, if you have downloaded the released checkpoints and want to use them). You must specify one of checkpoint and global_step.

  • --seed: seed for training data you will use at inference
  • --test_batch_size: batch size for inference; you can use 16 with a 32GB GPU
  • --unseen_domain_only: specify if you would like to run inference on unseen domain only
  • --log_file: Similar to in training, specify the path to the file where you want to save logs

If you want to run inference for Multi-task zero-shot baseline, you can use a similar command but without --use_demonstrations and --k. For this baseline, you can use --test_batch_size 64 with a 32GB GPU.

If you want to run raw LM baselines in the paper, you do not need to specify --checkpoint or --global_step. Instead, specify --do_zeroshot, and then:

  • For 0-shot, run the command --method direct
  • For PMI 0-shot, run the command using --is_null, and then run the command using --use_calibration (for both, with --method direct)
  • For Channel 0-shot, run the command using --method channel
  • For In-context/PMI In-context/Channel In-context, do the same as above except always adding --use_demonstrations

You can use the same out_dir for all raw LM baselines if you are using the same GPT2 model, e.g., checkpoints/raw-gpt2-large

Downloading Checkpoints

You can run the inference script by specifying --checkpoint {model_name}, and the script will automatically download the corresponding checkpoint under the checkpoints/ directory. {model_name} can either be

  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}: corresponding method trained in the hr_to_lr setting
  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}-instruction: corresponding method trained in the hr_to_lr_inst_all setting
  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}/{setting_name}: corresponding method trained in the corresponding setting (for setting_name, see the Table in the data section)

Alternatively, you can download all checkpoints via:

python -m utils.download --checkpoints --setting all --method all

If you want to download one of settings only, specify --setting {setting_name} (using "alias for command" in the setting table above) If you want to download one of methods only, specify --method {method_name} where method_name is one of metaicl, channel-metaicl, multitask-zero, channel-multitask-zero.

Simply reproducing all results in the paper

You can use the following commands (based on a 32GB GPU):

# raw LM zero-shot baselines (0-shot, PMI 0-shot, Channel 0-shot)
bash reproduce.sh {setting_name} {zero|pmi-zero|channel-zero} 100 64

# raw LM in-context baselines (in-context, PMI in-context, Channel in-context)
bash reproduce.sh {setting_name} {ic|pmi-ic|channel-ic} 100,13,21,42,87 16

# Multi-task 0-shot baselines
bash reproduce.sh {setting_name} {multitask-zero|channel-multitask-zero} 100 64

# MetaICL
bash reproduce.sh {setting_name} {metaicl|channel-metaicl} 100,13,21,42,87 16

License

MetaICL is CC-BY-NC 4.0 licensed.

Owner
Meta Research
Meta Research
Numerical-computing-is-fun - Learning numerical computing with notebooks for all ages.

As much as this series is to educate aspiring computer programmers and data scientists of all ages and all backgrounds, it is also a reminder to mysel

EKA foundation 758 Dec 25, 2022
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

Yixuan He 9 Nov 24, 2022
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

ETH VIS Research Group 327 Dec 27, 2022
dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

dualFace dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ) We provide python implementations for our CVM 2021 paper "dualFac

Haoran XIE 46 Nov 10, 2022
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
Yolov3 pytorch implementation

YOLOV3 Pytorch实现 在bubbliiing大佬代码的基础上进行了修改,添加了部分注释。 预训练模型 预训练模型来源于bubbliiing。 链接:https://pan.baidu.com/s/1ncREw6Na9ycZptdxiVMApw 提取码:appk 训练自己的数据集 按照VO

4 Aug 27, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
Large dataset storage format for Pytorch

H5Record Large dataset ( 100G, = 1T) storage format for Pytorch (wip) Support python 3 pip install h5record Why? Writing large dataset is still a

theblackcat102 43 Oct 22, 2022
Make Watson Assistant send messages to your Discord Server

Make Watson Assistant send messages to your Discord Server Prerequisites Sign up for an IBM Cloud account. Fill in the required information and press

1 Jan 10, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution 📜 Technical report 🗨️ Presentation 🎉 Announcement 🛆Motion Prediction Channel Website 🛆

158 Jan 08, 2023
Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

SUCP Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation () Direct Friends (i.e., users who follow each o

Kosar 8 Nov 26, 2022
Gans-in-action - Companion repository to GANs in Action: Deep learning with Generative Adversarial Networks

GANs in Action by Jakub Langr and Vladimir Bok List of available code: Chapter 2: Colab, Notebook Chapter 3: Notebook Chapter 4: Notebook Chapter 6: C

GANs in Action 914 Dec 21, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 05, 2023
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 03, 2022
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

193 Dec 22, 2022