Codes for coreference-aware machine reading comprehension

Overview

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022.

Dataset

There are three folders for our three models mentioned in the paper: Coref_additive_spacy for Coref_additive_attention, Coref_dgl_spacy for GNN and Coref_multiplication_spacy for Coref_multiplication_attention, and each contains the train data set and the dev data set under the quoref folder.

each sample contains

  • context: the paragraph text
  • context_id: the unique identifier of the context
  • qas: a group of questions
  • question: question text
  • id: the unique identifier of the question
  • answers: a group of the answers to one question
  • text: answer text
  • answer_start: the start_position of one answer

Models

If you want to use our trained model, please download it from Google drive

Training

python run_quoref.py --train_file "quoref/train.json" --predict_file "quoref/dev.json" --model_type "roberta_multi" --model_name_or_path "roberta-large" --output_dir "out" --do_train --do_eval --eval_all_checkpoints --learning_rate 1e-5 --num_train_epochs 6 --overwrite_output_dir --per_gpu_train_batch_size 4 --save_steps 6000 --coref_weight 0.4

Kindly Hint

There is an open issue regarding the compatibility between NeuralCoref and spaCy 3.0. If you intend to use the latest spaCy models, please watch the issue.

Cite

If you extend or use this work, please cite the paper where it was introduced:

@article{Huang2021TracingOC,
  title={Tracing Origins: Coref-aware Machine Reading Comprehension},
  author={Baorong Huang and Zhuosheng Zhang and Hai Zhao},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07961}
}
ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

Walle 4 Aug 6, 2021
A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset 台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。 本資料集期望成為適用於遷移學習之標準中文閱讀理解資料集。 本資料集從2,108篇

null 256 Apr 8, 2022
GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

Yunxiao Zhao 3 Feb 8, 2022
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.5k Apr 15, 2022
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.2k Feb 18, 2021
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 146 Apr 4, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 44 Apr 3, 2022
Reading Wikipedia to Answer Open-Domain Questions

DrQA This is a PyTorch implementation of the DrQA system described in the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions. Quick Link

Facebook Research 4.2k Apr 12, 2022
This is my reading list for my PhD in AI, NLP, Deep Learning and more.

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

Zhong Peixiang 151 Mar 17, 2022
Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

null 44 Apr 8, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 451 Apr 12, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 11.9k Apr 16, 2022
The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

Nishant Banjade 8 Jan 19, 2022
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 31 Feb 18, 2022
Codes to pre-train Japanese T5 models

t5-japanese Codes to pre-train a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts. The model is available at https://hug

Megagon Labs 27 Feb 15, 2022
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

null 2 Nov 3, 2021
NLP codes implemented with Pytorch (w/o library such as huggingface)

NLP_scratch NLP codes implemented with Pytorch (w/o library such as huggingface) scripts ├── models: Neural Network models ├── data: codes for dataloa

null 3 Dec 28, 2021
Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

Utkarsh Jain 1 Feb 17, 2022
C.J. Hutto 3.5k Apr 14, 2022