Codes for coreference-aware machine reading comprehension

Last update: Sep 29, 2022

Related tags

Overview

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022.

Dataset

There are three folders for our three models mentioned in the paper: Coref_additive_spacy for Coref_additive_attention, Coref_dgl_spacy for GNN and Coref_multiplication_spacy for Coref_multiplication_attention, and each contains the train data set and the dev data set under the quoref folder.

each sample contains

context: the paragraph text
context_id: the unique identifier of the context
qas: a group of questions
question: question text
id: the unique identifier of the question
answers: a group of the answers to one question
text: answer text
answer_start: the start_position of one answer

Models

If you want to use our trained model, please download it from Google drive

Training

python run_quoref.py --train_file "quoref/train.json" --predict_file "quoref/dev.json" --model_type "roberta_multi" --model_name_or_path "roberta-large" --output_dir "out" --do_train --do_eval --eval_all_checkpoints --learning_rate 1e-5 --num_train_epochs 6 --overwrite_output_dir --per_gpu_train_batch_size 4 --save_steps 6000 --coref_weight 0.4

Kindly Hint

There is an open issue regarding the compatibility between NeuralCoref and spaCy 3.0. If you intend to use the latest spaCy models, please watch the issue.

Cite

If you extend or use this work, please cite the paper where it was introduced:

@article{Huang2021TracingOC,
  title={Tracing Origins: Coref-aware Machine Reading Comprehension},
  author={Baorong Huang and Zhuosheng Zhang and Hai Zhao},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07961}
}

Codes for coreference-aware machine reading comprehension

Related tags

Overview

Dataset

Models

Training

Kindly Hint

Cite

Owner

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

AllenNLP integration for Shiba: Japanese CANINE model

Built for cleaning purposes in military institutions

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Nmt - TensorFlow Neural Machine Translation Tutorial

History Aware Multimodal Transformer for Vision-and-Language Navigation

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Baseline code for Korean open domain question answering(ODQA)

An Explainable Leaderboard for NLP

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

Persian Bert For Long-Range Sequences

Dust model dichotomous performance analysis

Checking spelling of form elements

Sequence-to-Sequence Framework in PyTorch

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks