GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Related tags

Text Data & NLPGCRC
Overview

GCRC

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

Introduction

Currently, machine reading comprehension models have made exciting progress, driven by a large number of publicly available data sets. However, the real language comprehension capabilities of models are far from what people expect, and most of the data sets provide black-box evaluations that fail to diagnose whether the system is based on correct reasoning processes. In order to alleviate these problems and promote machine intelligence to humanoid intelligence, Shanxi University focuses on the more diverse and challenging reading comprehension tasks of the college entrance examination, and attempts to evaluate machine intelligence effectively and practically based on standardized human tests. We collected gaokao reading comprehension test questions in the past 10 years and constructed a datasets which is GCRC(A New MRC Dataset from Gaokao Chinese for Explainable Evaluation) containing more than 5000 texts and more than 8,700 multiple-choice questions (about 15,000 options). The datasets is annotated three kinds of information: the sentence level support fact, interference item’s error cause and the reasoning skills required to answer questions. Related experiments show that this datasets is more challenging, which is very useful for diagnosing system limitations in an interpretable manner, and will help researchers develop new machine learning and reasoning methods to solve these challenging problems in the future.

Leaderboard

GCRC Leaderboard for Explainable Evaluation

Paper

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation. ACL 2021 Findings.

Data Size

Train:6,994 questions;Dev:863 questions;Test:862 questions

Data Format

Each instance is composed of id (id, a string), title (title, a string), passage (passage, a string), question(question, a string), options (options, a list, representing the contents of A, B, C, and D, respectively), evidences (evidences, a list, representing the contents of the supporting sentence in the original text of A, B, C and D, respectively), reasoning_ability(reasoning_ability, a list,representing the reasoning ability required to answer questions of A, B, C and D, respectively), error_type (error_type, a list, representing the Error reason of A, B, C and D, respectively), answer(answer,a string).

Example

{
  "id": "gcrc_4916_8172", 
  "title": "我们需要怎样的科学素养", 
  "passage": "第八次中国公民科学素养调查显示,2010年,我国具备...激励科技创新、促进创新型国家建设,我们任重道远。", 
  "question": "下列对“我们需要怎样的科学素养”的概括,不正确的一项是", 
  "options":  [
    "科学素养是一项基本公民素质,公民科学素养可以从科学知识、科学方法和科学精神三个方面来衡量。",
    "不仅需要掌握足够的科学知识、科学方法,更需要具备学习、理解、表达、参与和决策科学事务的能力。",
    "应该明白科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面。", 
    "需要具备科学的思维和科学的精神,对科学技术能持怀疑态度,对于媒体信息具有质疑精神和过滤功能。"
  ],
  "evidences": [
    ["公民科学素养可以从三个方面衡量:科学知识、科学方法和科学精神。", "在“建设创新型国家”的语境中,科学素养作为一项基本公民素质的重要性不言而喻。"],
    ["一个具备科学素养的公民,不仅应该掌握足够的科学知识、科学方法,更需要强调科学的思维、科学的精神,理性认识科技应用到社会中可能产生的影响,进而具备学习、理解、表达、参与和决策科学事务的能力。"], 
    ["西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"], 
    ["甚至还有国家专门测试公众对于媒体信息是否具有质疑精神和过滤功能。", "西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"]
   ],
  "error_type": ["E", "", "", ""],
  "answer": "A",
}

Evaluation Code

The prediction result needs to be consistent with the format of the training set.

python eval.py prediction_file test_private_file

Participants are required to complete the following tasks: Task 1: Output the answer to the question. Task 2: Output the sentence-level supporting facts(SFs) that support the answer to the question, that is, the original supporting sentences for each option. Task 3: Output the error cause of the interference option. There are 7 reasons for the error in this evaluation: 1) Wrong details; 2) Wrong temporal properties; 3) Wrong subject-predicate-object triple relationship; 4) Wrong necessary and sufficient conditions; 5) Wrong causality; 6) Irrelevant to the question; 7) Irrelevant to the article. The evaluation metrics are Task1_Acc, Task2_F1,Task3_Acc(The accuracy of error reason identification),and the output is in dictionary format.

return {"Task1_Acc":_, " Task2_F1":_, "Task3_Acc":_}

Author List

Hongye Tan, Xiaoyue Wang, Yu Ji, Ru Li, Xiaoli Li, Zhiwei Hu, Yunxiao Zhao, Xiaoqi Han.

Institutions

Shanxi University

Citation

Please kindly cite our paper if the work is helpful.

@inproceedings{tan-etal-2021-gcrc,
    title = "{GCRC}: A New Challenging {MRC} Dataset from {G}aokao {C}hinese for Explainable Evaluation",
    author = "Tan, Hongye  and
      Wang, Xiaoyue  and
      Ji, Yu  and
      Li, Ru  and
      Li, Xiaoli  and
      Hu, Zhiwei  and
      Zhao, Yunxiao  and
      Han, Xiaoqi",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.113",
    doi = "10.18653/v1/2021.findings-acl.113",
    pages = "1319--1330",
}
Owner
Yunxiao Zhao
Yunxiao Zhao
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉 What is this? At this repo, I'm

M. Yusuf Sarıgöz 13 Oct 10, 2022
SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

jon_chun 14 Dec 19, 2022
The guide to tackle with the Text Summarization

The guide to tackle with the Text Summarization

Takahiro Kubo 1.2k Dec 30, 2022
Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

Jesse Zaneveld 33 Dec 28, 2022
Collection of scripts to pinpoint obfuscated code

Obfuscation Detection (v1.0) Author: Tim Blazytko Automatically detect control-flow flattening and other state machines Description: Scripts and binar

Tim Blazytko 230 Nov 26, 2022
Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

Sparki 1 Jun 11, 2022
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 04, 2021
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

James Betker 2.1k Jan 01, 2023
Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

tqfang 9 Dec 17, 2022
Yet Another Compiler Visualizer

yacv: Yet Another Compiler Visualizer yacv is a tool for visualizing various aspects of typical LL(1) and LR parsers. Check out demo on YouTube to see

Ashutosh Sathe 129 Dec 17, 2022
The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

thai_sentiment The naive sentiment classification function based on NBSVM trained on wisesight_sentiment วิธีติดตั้ง pip install thai_sentiment==0.1.3

Charin 7 Dec 08, 2022
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

2 Jul 05, 2022
Tools and data for measuring the popularity & growth of various programming languages.

growth-data Tools and data for measuring the popularity & growth of various programming languages. Install the dependencies $ pip install -r requireme

3 Jan 06, 2022
本插件是pcrjjc插件的重置版,可以独立于后端api运行

pcrjjc2 本插件是pcrjjc重置版,不需要使用其他后端api,但是需要自行配置客户端 本项目基于AGPL v3协议开源,由于项目特殊性,禁止基于本项目的任何商业行为 配置方法 环境需求:.net framework 4.5及以上 jre8 别忘了装jre8 别忘了装jre8 别忘了装jre8

132 Dec 26, 2022
Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

A Infomation Grathering tool that reverse search phone numbers and get their details ! What is phomber? Phomber is one of the best tools available fo

S41R4J 121 Dec 27, 2022
多语言降噪预训练模型MBart的中文生成任务

mbart-chinese 基于mbart-large-cc25 的中文生成任务 Input source input: text + /s + lang_code target input: lang_code + text + /s Usage token_ids_mapping.jso

11 Sep 19, 2022
This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Aspect_Based_Sentiment_Extraction Created on: 5th Jan, 2022. This project deals with an important field of Natural Lnaguage Processing - Aspect Based

Naman Rastogi 4 Jan 01, 2023
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 08, 2022