Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Overview

Re-TACRED

Re-TACRED: Addressing Shortcomings of the TACRED Dataset
George Stoica, Emmanouil Antonios Platanios, and Barnabás Póczos
In Proceedings of the Thirty-fifth AAAI Conference on Artificial Intelligence 2021

Primary Contact: George Stoica. As of Jan 2021, I am no longer at CMU, and the cs.cmu.edu email may no longer work. Please contact me instead at: [email protected].

Changelog

  • 1.0 - Initial dataset release: Data consisted of 105,206 total instances spread across 40 relations.
  • 1.1 - Updated dataset release: After extensive discussion, we have elected to prune Re-TACRED by ~ 14K instances. The new dataset has 91,467 instances, spread across 40 relations. Pruned data consisted of a mixture of messily segmented entities (and corresponding types), or sentences whose relations were ambigious. While this version is smaller, it is cleaner, and better defined.

This repository contains all relevant resources for using Re-TACRED, a new relation extraction dataset.

For details on this work please check out our:

Below we describe the contents of the four repository directories by name.

Re-TACRED

This directory contains version 1.1 of our revised TACRED dataset patches for each split. Due to licensing restrictions, we cannot provide the complete dataset. However, following Alt, Gabryszak, and Hennig (2020), our patch consists of json files mapping TACRED instances by their id to our revised labels.

The original TACRED dataset is available for download from the LDC here. It is free for members, or $25 for non-members.

Applying the patch is simple and only requires replacing each TACRED instance (where applicable) with our revised relation. For convenience, we provide a script for this named apply_patch.py in the Re-TACRED directory. In the script, you only need to replace

tacred_dir = None
save_dir = None

With the path to your TACRED dataset save directory, and the directory where you wish to save the patched data to respectively.

PA-LSTM, C-GCN & SpanBERT

We base our experiments off of the open-source model repositories of:

However, it is not possible to simply pass Re-TACRED to each model repository because each is hardcoded for TACRED. Thus, we must modify certain files to make each model Re-TACRED compatible. To make it as easy as possible, we provide all our altered files in each named model directory (e.g., the provided PA-LSTM directory). All that needs to be done is to replace the corresponding file in our provided directory with the corresponding file in the original model repository. For instance, you may replace SpanBERT's "run_tacred.py" file with our "run_tacred.py" file. Running experiments is equivalent to how it is performed in the original model repositories.

Note that our files also contain certain "quality of life" changes that make running each model more convenient for us. Examples include adding and tracking the test split while training (as opposed to only the dev set).

Owner
George Stoica
PhD ML @ Georgia Tech
George Stoica
Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

Jimmy Wu 70 Jan 02, 2023
Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

PatchNets This is the official repository for the project "PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations". For details,

16 May 22, 2022
Basit bir burç modülü.

Bu modulu burclar hakkinda gundelik bir sekilde bilgi alin diye yaptim ve sizler icin kullanima sunuyorum. Modulun kullanimi asiri basit: Ornek Kullan

Special 17 Jun 08, 2022
tensorflow code for inverse face rendering

InverseFaceRender This is tensorflow code for our project: Learning Inverse Rendering of Faces from Real-world Videos. (https://arxiv.org/abs/2003.120

Yuda Qiu 18 Nov 16, 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao Zhejiang University ACL 2022 Mai

Jinglin Liu 257 Dec 30, 2022
Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

shzhang 59 Dec 10, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.

Ravn Tech, Inc. 165 Nov 04, 2022
This repository is for the preprint "A generative nonparametric Bayesian model for whole genomes"

BEAR Overview This repository contains code associated with the preprint A generative nonparametric Bayesian model for whole genomes (2021), which pro

Debora Marks Lab 10 Sep 18, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
tinykernel - A minimal Python kernel so you can run Python in your Python

tinykernel - A minimal Python kernel so you can run Python in your Python

fast.ai 37 Dec 02, 2022
A library for implementing Decentralized Graph Neural Network algorithms.

decentralized-gnn A package for implementing and simulating decentralized Graph Neural Network algorithms for classification of peer-to-peer nodes. De

Multimedia Knowledge and Social Analytics Lab 5 Nov 07, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Where we are ? 12.27 目前和原论文仍有1%左右得差距,但已经力压很多SOTA了 ckpt__448_epoch_25.pth mIoU

zichengsaber 60 Dec 11, 2022
One-line your code easily but still with the fun of doing so!

One-liner-iser One-line your code easily but still with the fun of doing so! Have YOU ever wanted to write one-line Python code, but don't have the sa

5 May 04, 2022
FairFuzz: AFL extension targeting rare branches

FairFuzz An AFL extension to increase code coverage by targeting rare branches. FairFuzz has a particular advantage on programs with highly nested str

Caroline Lemieux 222 Nov 16, 2022
Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

Tarek Naous 24 Nov 11, 2022
Implementation of average- and worst-case robust flatness measures for adversarial training.

Relating Adversarially Robust Generalization to Flat Minima This repository contains code corresponding to the MLSys'21 paper: D. Stutz, M. Hein, B. S

David Stutz 13 Nov 27, 2022
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Hierarchical Transformer Memory (HTM) - Pytorch Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a si

Phil Wang 63 Dec 29, 2022
adversarial_multi_armed_bandit_variable_plays

Adversarial Multi-Armed Bandit with Variable Plays This code is for paper: Adversarial Online Learning with Variable Plays in the Evasion-and-Pursuit

Yiyang Wang 1 Oct 28, 2021
Using Hotel Data to predict High Value And Potential VIP Guests

Description Using hotel data and AI to predict high value guests and potential VIP guests. Hotel can leverage on prediction resutls to run more effect

HCG 12 Feb 14, 2022