Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Last update: Nov 09, 2022

Overview

Realistic Few-Shot Relation Extraction

This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extraction" to appear in The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). This code is not intended to be modified or reused. It is a fork of an existing FewRel repository with some modifications.

Fine-tuning

The following command is to fine-tune a pre-trained model on a training dataset complying with the FewRel's format (see the Dataset section below).

python -m fewrel.fewrel_eval \
  --train train_wiki \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \
  --train_iter 10000 \
  --val_iter 1000 \
  --val_step 2000 \
  --test_iter 2000

The above command will dump the fine-tuned model under ./checkpoint. The following command can be used to get the overall accuracy for the fine-tuned model.

Overall accuracy

python -m fewrel.fewrel_eval \
  --only_test \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path} \
  --test_iter 2000

[email protected] for individual relations

Precision at 50 can be calculated using the following command

python -m fewrel.alt_eval \
  --test {test_file_name_without_extension} \ # e.g., tacred_org 
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root {path_to_data_folder} \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path}

Pre-trained models

In this work, several encoders are experimented with including CNN, BERT, SpanBERT, RoBERTa-base, RoBERTa-large, and LUKE-base. Most pre-trained models can be downloaded from Hugging Face Transformers, and LUKE-base can be downloaded from its original GitHub repository.

Note: the original LUKE code depends on an older version of HuggingFace Transformers, which is not compatible with the version used in this repository. To experiment with LUKE, please run script ./checkout_out_luke.sh. This will first clone the original LUKE repository, apply the necessary changes to make luke compatible with this repo, and move the LUKE module to the correct place to make sure the code runs correctly.

Dataset

The original FewRel dataset has already be contained in the github repo (here)[./data/fewrel]. To convert other dataset (e.g., TACRED) to the FewRel format, one could use ./scripts/prep_more_data.py.

./scripts/select_rel.py is a script to augment an existing dataset with relations from another dataset. For example, to add a list of relations from dataset source.json to destination.json and dump the merged dataset to a file output.json, one can use the following command:

python scripts/select_rel.py add_rel \
  --src source.json \
  --dst destination.json \
  --output output.json \
  --rels {relations_delimitated_by_space}

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Related tags

Overview

Realistic Few-Shot Relation Extraction

Fine-tuning

Overall accuracy

[email protected] for individual relations

Pre-trained models

Dataset

Owner

Bloomberg

Natural Language Processing at EDHEC, 2022

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

An ActivityWatch watcher to pose questions to the user and record her answers.

Top2Vec is an algorithm for topic modeling and semantic search.

Sentence Embeddings with BERT & XLNet

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

NLP topic mdel LDA - Gathered from New York Times website

This repository is home to the Optimus data transformation plugins for various data processing needs.

Write Python in Urdu - اردو میں کوڈ لکھیں

Extracting Summary Knowledge Graphs from Long Documents

Higher quality textures for the Metal Gear Solid series.

Linear programming solver for paper-reviewer matching and mind-matching

Graph4nlp is the library for the easy use of Graph Neural Networks for NLP

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

An Open-Source Package for Neural Relation Extraction (NRE)