Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Last update: Jun 21, 2022

Related tags

Overview

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 (Rajpurkar et al., 2018) dataset, where each question in the dev set is annotated to add a contextual disfluency using the paragraph as a source of distractors.

The final dataset consists of ~12k (disfluent question, answer) pairs. Over 90% of the disfluencies are corrections or restarts, making it a much harder test set for disfluency correction. Disfl-QA aims to fill a major gap between speech and NLP research community. We hope the dataset can serve as a benchmark dataset for testing robustness of models against disfluent inputs.

Our expriments reveal that the state-of-the-art models are brittle when subjected to disfluent inputs from Disfl-QA. Detailed experiments and analyses can be found in our paper.

Dataset Description

Disfl-QA consists of ~12k disfluent questions with the following train/dev/test splits:

File	Questions
train.json	7182
dev.json	1000
test.json	3643

Each JSON file consists of original question (SQuAD-v2) and disfluent question (Disfl-QA) in the following format:

{ 
  "squad_v2_id":
  {
    "original": Original question from SQuAD-v2,
    "disfluent": Disfluent question from Disfl-QA
  }, ...
}

Note: The squad_v2_id corresponds to the unique data.paragraphs.qas.id in SQuAD-v2 development set.

Here's an example from the dataset:

 {
  "56ddde6b9a695914005b9628": {
    "original": "In what country is Normandy located?",
    "disfluent": "In what country is Norse found no wait Normandy not Norse?"
  },
  "56ddde6b9a695914005b9629": {
    "original": "When were the Normans in Normandy?",
    "disfluent": "From which countries no tell me when were the Normans in Normandy?"
  },
  "56ddde6b9a695914005b962a": {
    "original": "From which countries did the Norse originate?",
    "disfluent": "From which Norse leader I mean countries did the Norse originate?"
  },
  "56ddde6b9a695914005b962b": {
    "original": "Who was the Norse leader?",
    "disfluent": "When I mean Who was the Norse leader?"
  },
  "56ddde6b9a695914005b962c": {
    "original": "What century did the Normans first gain their separate identity?",
    "disfluent": "When no what century did the Normans first gain their separate identity?"
  },
 }

Citation

If you use or discuss this dataset in your work, please cite it as follows:

@inproceedings{gupta-etal-2021-disflqa,
    title = "{Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering}",
    author = "Gupta, Aditya and Xu, Jiacheng and Upadhyay, Shyam and Yang, Diyi and Faruqui, Manaal",
    booktitle = "Findings of ACL",
    year = "2021"
}

License

Disfl-QA dataset is licensed under CC BY 4.0.

Contact

If you have a technical question regarding the dataset or publication, please create an issue in this repository.

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Related tags

Overview

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Dataset Description

Citation

License

Contact

Owner

Google Research Datasets

Text Classification in Turkish Texts with Bert

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

A Chinese to English Neural Model Translation Project

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Code for the Python code smells video on the ArjanCodes channel.

The ibet-Prime security token management system for ibet network.

Nested Named Entity Recognition for Chinese Biomedical Text

Auto-researching tool generating word documents.

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

p-tuning for few-shot NLU task

A repo for materials relating to the tutorial of CS-332 NLP

NLPShala , the best IDE for all Natural language processing tasks.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

Scikit-learn style model finetuning for NLP

JaQuAD: Japanese Question Answering Dataset