GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Related tags

Text Data & NLPGCRC
Overview

GCRC

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

Introduction

Currently, machine reading comprehension models have made exciting progress, driven by a large number of publicly available data sets. However, the real language comprehension capabilities of models are far from what people expect, and most of the data sets provide black-box evaluations that fail to diagnose whether the system is based on correct reasoning processes. In order to alleviate these problems and promote machine intelligence to humanoid intelligence, Shanxi University focuses on the more diverse and challenging reading comprehension tasks of the college entrance examination, and attempts to evaluate machine intelligence effectively and practically based on standardized human tests. We collected gaokao reading comprehension test questions in the past 10 years and constructed a datasets which is GCRC(A New MRC Dataset from Gaokao Chinese for Explainable Evaluation) containing more than 5000 texts and more than 8,700 multiple-choice questions (about 15,000 options). The datasets is annotated three kinds of information: the sentence level support fact, interference item’s error cause and the reasoning skills required to answer questions. Related experiments show that this datasets is more challenging, which is very useful for diagnosing system limitations in an interpretable manner, and will help researchers develop new machine learning and reasoning methods to solve these challenging problems in the future.

Leaderboard

GCRC Leaderboard for Explainable Evaluation

Paper

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation. ACL 2021 Findings.

Data Size

Train:6,994 questions;Dev:863 questions;Test:862 questions

Data Format

Each instance is composed of id (id, a string), title (title, a string), passage (passage, a string), question(question, a string), options (options, a list, representing the contents of A, B, C, and D, respectively), evidences (evidences, a list, representing the contents of the supporting sentence in the original text of A, B, C and D, respectively), reasoning_ability(reasoning_ability, a list,representing the reasoning ability required to answer questions of A, B, C and D, respectively), error_type (error_type, a list, representing the Error reason of A, B, C and D, respectively), answer(answer,a string).

Example

{
  "id": "gcrc_4916_8172", 
  "title": "我们需要怎样的科学素养", 
  "passage": "第八次中国公民科学素养调查显示,2010年,我国具备...激励科技创新、促进创新型国家建设,我们任重道远。", 
  "question": "下列对“我们需要怎样的科学素养”的概括,不正确的一项是", 
  "options":  [
    "科学素养是一项基本公民素质,公民科学素养可以从科学知识、科学方法和科学精神三个方面来衡量。",
    "不仅需要掌握足够的科学知识、科学方法,更需要具备学习、理解、表达、参与和决策科学事务的能力。",
    "应该明白科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面。", 
    "需要具备科学的思维和科学的精神,对科学技术能持怀疑态度,对于媒体信息具有质疑精神和过滤功能。"
  ],
  "evidences": [
    ["公民科学素养可以从三个方面衡量:科学知识、科学方法和科学精神。", "在“建设创新型国家”的语境中,科学素养作为一项基本公民素质的重要性不言而喻。"],
    ["一个具备科学素养的公民,不仅应该掌握足够的科学知识、科学方法,更需要强调科学的思维、科学的精神,理性认识科技应用到社会中可能产生的影响,进而具备学习、理解、表达、参与和决策科学事务的能力。"], 
    ["西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"], 
    ["甚至还有国家专门测试公众对于媒体信息是否具有质疑精神和过滤功能。", "西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"]
   ],
  "error_type": ["E", "", "", ""],
  "answer": "A",
}

Evaluation Code

The prediction result needs to be consistent with the format of the training set.

python eval.py prediction_file test_private_file

Participants are required to complete the following tasks: Task 1: Output the answer to the question. Task 2: Output the sentence-level supporting facts(SFs) that support the answer to the question, that is, the original supporting sentences for each option. Task 3: Output the error cause of the interference option. There are 7 reasons for the error in this evaluation: 1) Wrong details; 2) Wrong temporal properties; 3) Wrong subject-predicate-object triple relationship; 4) Wrong necessary and sufficient conditions; 5) Wrong causality; 6) Irrelevant to the question; 7) Irrelevant to the article. The evaluation metrics are Task1_Acc, Task2_F1,Task3_Acc(The accuracy of error reason identification),and the output is in dictionary format.

return {"Task1_Acc":_, " Task2_F1":_, "Task3_Acc":_}

Author List

Hongye Tan, Xiaoyue Wang, Yu Ji, Ru Li, Xiaoli Li, Zhiwei Hu, Yunxiao Zhao, Xiaoqi Han.

Institutions

Shanxi University

Citation

Please kindly cite our paper if the work is helpful.

@inproceedings{tan-etal-2021-gcrc,
    title = "{GCRC}: A New Challenging {MRC} Dataset from {G}aokao {C}hinese for Explainable Evaluation",
    author = "Tan, Hongye  and
      Wang, Xiaoyue  and
      Ji, Yu  and
      Li, Ru  and
      Li, Xiaoli  and
      Hu, Zhiwei  and
      Zhao, Yunxiao  and
      Han, Xiaoqi",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.113",
    doi = "10.18653/v1/2021.findings-acl.113",
    pages = "1319--1330",
}
Owner
Yunxiao Zhao
Yunxiao Zhao
Translation to python of Chris Sims' optimization function

pycsminwel This is a locol minimization algorithm. Uses a quasi-Newton method with BFGS update of the estimated inverse hessian. It is robust against

Gustavo Amarante 1 Mar 21, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
XLNet: Generalized Autoregressive Pretraining for Language Understanding

Introduction XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective.

Zihang Dai 6k Jan 07, 2023
Contains descriptions and code of the mini-projects developed in various programming languages

TexttoSpeechAndLanguageTranslator-project introduction A pleasant application where the client will be given buttons like play,reset and exit. The cli

Adarsh Reddy 1 Dec 22, 2021
Tools, wrappers, etc... for data science with a concentration on text processing

Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess

207 Nov 22, 2022
초성 해석기 based on ko-BART

초성 해석기 개요 한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다. 초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 예측 문장: 나는 너를 좋아해 모델 모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다. 데이터 문장 단위로 이루어진 아무 코퍼스나

Dawoon Jung 29 Oct 28, 2022
Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 04, 2022
edge-SR: Super-Resolution For The Masses

edge-SR: Super Resolution For The Masses Citation Pablo Navarrete Michelini, Yunhua Lu and Xingqun Jiang. "edge-SR: Super-Resolution For The Masses",

Pablo 40 Nov 10, 2022
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
English loanwords in the world's languages

Wiktionary as CLDF Content cldf1 and cldf2 contain cldf-conform data sets with a total of 2 377 756 entries about the vocabulary of all 1403 languages

Viktor Martinović 3 Jan 14, 2022
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

MedMCQA MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering A large-scale, Multiple-Choice Question Answe

MedMCQA 24 Nov 30, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
AI-Broad-casting - AI Broad casting with python

Basic Code 1. Use The Code Configuration Environment conda create -n code_base p

Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the

hezw.tkcw 20 Dec 12, 2022
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Jan 07, 2023