CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Related tags

Deep LearningCLUES
Overview

License: MIT

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

This repo contains the data and source code for baseline models in the NeurIPS 2021 benchmark paper for Constrained Language Understanding Evaluation Standard (CLUES) under MIT License.

Overview

The benchmark data is located in the data directory. We also release source codes for two fine-tuning strategies on CLUES, one with classic fine-tuning and the other with prompt-based fine-tuning.

Classic finetuning

Setup Environment

  1. > git clone [email protected]:microsoft/CLUES.git
  2. > git clone [email protected]:namisan/mt-dnn.git
  3. > cp -rf CLUES/classic_finetuning/ mt-dnn/
  4. > cd mt-dnn/

Run Experiments

  1. Preprocess data
    > bash run_clues_data_process.sh

  2. Train/test Models
    > bash run_clues_batch.sh

Prompt fine-tuning

Setup

  1. cd prompt_finetuning
  2. Run sh setup.sh to automatically fetch dependency codebase and apply our patch for CLUES

Run Experiments

All prompt-based funetuning baselines run commands are in experiments.sh, simple run by sh experiments.sh

Leaderboard

Here we maintain a leaderboard, allowing researchers to submit their results as entries.

Submission Instructions

  • Each submission must be submitted as a pull request modifying the markdown file underlying the leaderboard.
  • The submission must attach an accompanying public paper and public source code for reproducing their results on our dataset.
  • A submission can be toward any subset of tasks in our benchmark, or toward the aggregate leaderboard.
  • For any task targeted by the submission, we require evaluation on (1) 10, 20, and 30 shots, and (2) all 5 splits of the corresponding dataset and a report of their mean and standard deviation.
  • Each leaderboard will be sorted by the 30-shot mean S1 score (where S1 score is a variant of F1 score defined in our paper).
  • The submission should not use data from the 4 other splits during few-shot finetuning of any 1 split, either as extra training set or as validation set for hyperparameter tuning.
  • However, we allow external data, labeled or unlabeled, to be used for such purposes. Each submission using external data must mark the corresponding columns "external labeled" and/or "external unlabeled". Note, in this context, "external data" refers to data used after pretraining (e.g., for task-specific tuning); in particular, methods using existing pretrained models only, without extra data, should not mark either column. For obvious reasons, models cannot be trained on the original labeled datasets from where we sampled the few-shot CLUES data.
  • In the table entry, the submission should include a method name and a citation, hyperlinking to their publicly released source code reproducing the results. See the last entry of the table below for an example.

Abbreviations

  • FT = (classic) finetuning
  • PT = prompt based tuning
  • ICL = in-context learning, in the style of GPT-3
  • μ±σ = mean μ and standard deviation σ across our 5 splits. Aggregate standard deviation is calculated using the sum-of-variance formula from individual tasks' standard deviations.

Benchmarking CLUES for Aggregate 30-shot Evaluation

Shots (K=30) external labeled external unlabeled Average ▼ SST-2 MNLI CoNLL03 WikiANN SQuAD-v2 ReCoRD
Human N N 81.4 83.7 69.4 87.4 82.6 73.5 91.9
T5-Large-770M-FT N N 43.1±6.7 52.3±2.9 36.8±3.8 51.2±0.1 62.4±0.6 43.7±2.7 12±3.8
BERT-Large-336M-FT N N 42.1±7.8 55.4±2.5 33.3±1.4 51.3±0 62.5±0.6 35.3±6.4 14.9±3.4
BERT-Base-110M-FT N N 41.5±9.2 53.6±5.5 35.4±3.2 51.3±0 62.8±0 32.6±5.8 13.1±3.3
DeBERTa-Large-400M-FT N N 40.1±17.8 47.7±9.0 26.7±11 48.2±2.9 58.3±6.2 38.7±7.4 21.1±3.6
RoBERTa-Large-355M-FT N N 40.0±10.6 53.2±5.6 34.0±1.1 44.7±2.6 48.4±6.7 43.5±4.4 16±2.8
RoBERTa-Large-355M-PT N N 90.2±1.8 61.6±3.5
DeBERTa-Large-400M-PT N N 88.4±3.3 62.9±3.1
BERT-Large-336M-PT N N 82.7±4.1 45.3±2.0
GPT3-175B-ICL N N 91.0±1.6 33.2±0.2
BERT-Base-110M-PT N N 79.4±5.6 42.5±3.2
LiST (Wang et al.) N Y 91.3 ±0.7 67.9±3.0
Example (lastname et al.) Y/N Y/N 0±0 0±0 0±0 0±0 0±0 0±0 0±0

Individual Task Performance over Multiple Shots

SST-2

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
GPT-3 (175B) ICL N N 85.9±3.7 92.0±0.7 91.0±1.6 -
RoBERTa-Large PT N N 88.8±3.9 89.0±1.1 90.2±1.8 93.8
DeBERTa-Large PT N N 83.4±5.3 87.8±3.5 88.4±3.3 91.9
Human N N 79.8 83 83.7 -
BERT-Large PT N N 63.2±11.3 78.2±9.9 82.7±4.1 91
BERT-Base PT N N 63.9±10.0 76.7±6.6 79.4±5.6 91.9
BERT-Large FT N N 46.3±5.5 55.5±3.4 55.4±2.5 99.1
BERT-Base FT N N 46.2±5.6 54.0±2.8 53.6±5.5 98.1
RoBERTa-Large FT N N 38.4±21.7 52.3±5.6 53.2±5.6 98.6
T5-Large FT N N 51.2±1.8 53.4±3.2 52.3±2.9 97.6
DeBERTa-Large FT N N 43.0±11.9 40.8±22.6 47.7±9.0 100
Example (lastname et al.) Y/N Y/N 0±0 0±0 0±0 -

MNLI

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
Human N Y 78.1 78.6 69.4 -
LiST (wang et al.) N N 60.5±8.3 67.2±4.5 67.9±3.0 -
DeBERTa-Large PT N N 44.5±8.2 60.7±5.3 62.9±3.1 88.1
RoBERTa-Large PT N N 57.7±3.6 58.6±2.9 61.6±3.5 87.1
BERT-Large PT N N 41.7±1.0 43.7±2.1 45.3±2.0 81.9
BERT-Base PT N N 40.4±1.8 42.1±4.4 42.5±3.2 81
T5-Large FT N N 39.8±3.3 37.9±4.3 36.8±3.8 85.9
BERT-Base FT N N 37.0±5.2 35.2±2.7 35.4±3.2 81.6
RoBERTa-Large FT N N 34.3±2.8 33.4±0.9 34.0±1.1 85.5
BERT-Large FT N N 33.7±0.4 28.2±14.8 33.3±1.4 80.9
GPT-3 (175B) ICL N N 33.5±0.7 33.1±0.3 33.2±0.2 -
DeBERTa-Large FT N N 27.4±14.1 33.6±2.5 26.7±11.0 87.6

CoNLL03

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
Human N N 87.7 89.7 87.4 -
BERT-Base FT N N 51.3±0 51.3±0 51.3±0 -
BERT-Large FT N N 51.3±0 51.3±0 51.3±0 89.3
T5-Large FT N N 46.3±6.9 50.0±0.7 51.2±0.1 92.2
DeBERTa-Large FT N N 50.1±1.2 47.8±2.5 48.2±2.9 93.6
RoBERTa-Large FT N N 50.8±0.5 44.6±5.1 44.7±2.6 93.2

WikiANN

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
Human N N 81.4 83.5 82.6 -
BERT-Base FT N N 62.8±0 62.8±0 62.8±0 88.8
BERT-Large FT N N 62.8±0 62.6±0.4 62.5±0.6 91
T5-Large FT N N 61.7±0.7 62.1±0.2 62.4±0.6 87.4
DeBERTa-Large FT N N 58.5±3.3 57.9±5.8 58.3±6.2 91.1
RoBERTa-Large FT N N 58.5±8.8 56.9±3.4 48.4±6.7 91.2

SQuAD v2

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
Human N N 71.9 76.4 73.5 -
T5-Large FT N N 43.6±3.5 28.7±13.0 43.7±2.7 87.2
RoBERTa-Large FT N N 38.1±7.2 40.1±6.4 43.5±4.4 89.4
DeBERTa-Large FT N N 41.4±7.3 44.4±4.5 38.7±7.4 90
BERT-Large FT N N 42.3±5.6 35.8±9.7 35.3±6.4 81.8
BERT-Base FT N N 46.0±2.4 34.9±9.0 32.6±5.8 76.3

ReCoRD

Shots (K) external labeled external unlabeled 10 20 30 ▼ All
Human N N 94.1 94.2 91.9 -
DeBERTa-Large FT N N 15.7±5.0 16.8±5.7 21.1±3.6 80.7
RoBERTa-Large FT N N 12.0±1.9 9.9±6.2 16.0±2.8 80.3
BERT-Large FT N N 9.9±5.2 11.8±4.9 14.9±3.4 66
BERT-Base FT N N 10.3±1.8 11.7±2.4 13.1±3.3 54.4
T5-Large FT N N 11.9±2.7 11.7±1.5 12.0±3.8 77.3

How do I cite CLUES?

@article{cluesteam2021,
  title={Few-Shot Learning Evaluation in Natural Language Understanding},
  author={Mukherjee, Subhabrata and Liu, Xiaodong and Zheng, Guoqing and Hosseini, Saghar and Cheng, Hao and Yang, Greg and Meek, Christopher and Awadallah, Ahmed Hassan and Gao, Jianfeng},
  year={2021}
}

Acknowledgments

MT-DNN: https://github.com/namisan/mt-dnn
LM-BFF: https://github.com/princeton-nlp/LM-BFF

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Layer 7 DDoS Panel with Cloudflare Bypass ( UAM, CAPTCHA, BFM, etc.. )

Blood Deluxe DDoS DDoS Attack Panel includes CloudFlare Bypass (UAM, CAPTCHA, BFM, etc..)(It works intermittently. Working on it) Don't attack any web

272 Nov 01, 2022
pytorch implementation of the ICCV'21 paper "MVTN: Multi-View Transformation Network for 3D Shape Recognition"

MVTN: Multi-View Transformation Network for 3D Shape Recognition (ICCV 2021) By Abdullah Hamdi, Silvio Giancola, Bernard Ghanem Paper | Video | Tutori

Abdullah Hamdi 64 Jan 03, 2023
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply c

Max Halford 24 Dec 20, 2022
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
A collection of Jupyter notebooks to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

StyleGAN3 CLIP-based guidance StyleGAN3 + CLIP StyleGAN3 + inversion + CLIP This repo is a collection of Jupyter notebooks made to easily play with St

Eugenio Herrera 176 Dec 30, 2022
Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Fair-SSL Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning? Ethical bias in machine learning models has become

1 Dec 18, 2021
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target i

NanYoMy 13 Oct 09, 2022
Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model Official repository for the ICCV 2021 paper: UltraPose: Syn

MomoAILab 92 Dec 21, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
Demo notebooks for Qiskit application modules demo sessions (Oct 8 & 15):

qiskit-application-modules-demo-sessions This repo hosts demo notebooks for the Qiskit application modules demo sessions hosted on Qiskit YouTube. Par

Qiskit Community 46 Nov 24, 2022
Real life contra a deep learning project built using mediapipe and openc

real-life-contra Description A python script that translates the body movement into in game control. Welcome to all new real life contra a deep learni

Programminghut 7 Jan 26, 2022
Speedy Implementation of Instance-based Learning (IBL) agents in Python

A Python library to create single or multi Instance-based Learning (IBL) agents that are built based on Instance Based Learning Theory (IBLT) 1 Instal

0 Nov 18, 2021
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization This repository contains the evaluation code and alternative pseudo ground truth

Torsten Sattler 36 Dec 22, 2022
Monitora la qualità della ricezione dei segnali radio nelle province siciliane.

FMap-server Monitora la qualità della ricezione dei segnali radio nelle province siciliane. Conversion data Frequency - StationName maps are stored in

Triglie 5 May 24, 2021
Corgis are the cutest creatures; have 30K of them!

corgi-net This is a dataset of corgi images scraped from the corgi subreddit. After filtering using an ImageNet classifier, the training set consists

Alex Nichol 6 Dec 24, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 36 Oct 31, 2022
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021 Global Pooling, More than Meets the Eye: Posi

Md Amirul Islam 32 Apr 24, 2022
alfred-py: A deep learning utility library for **human**

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

JinTian 800 Jan 03, 2023