Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Overview

Realistic Few-Shot Relation Extraction

This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extraction" to appear in The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). This code is not intended to be modified or reused. It is a fork of an existing FewRel repository with some modifications.

Fine-tuning

The following command is to fine-tune a pre-trained model on a training dataset complying with the FewRel's format (see the Dataset section below).

python -m fewrel.fewrel_eval \
  --train train_wiki \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \
  --train_iter 10000 \
  --val_iter 1000 \
  --val_step 2000 \
  --test_iter 2000

The above command will dump the fine-tuned model under ./checkpoint. The following command can be used to get the overall accuracy for the fine-tuned model.

Overall accuracy

python -m fewrel.fewrel_eval \
  --only_test \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path} \
  --test_iter 2000

[email protected] for individual relations

Precision at 50 can be calculated using the following command

python -m fewrel.alt_eval \
  --test {test_file_name_without_extension} \ # e.g., tacred_org 
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root {path_to_data_folder} \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path}

Pre-trained models

In this work, several encoders are experimented with including CNN, BERT, SpanBERT, RoBERTa-base, RoBERTa-large, and LUKE-base. Most pre-trained models can be downloaded from Hugging Face Transformers, and LUKE-base can be downloaded from its original GitHub repository.

Note: the original LUKE code depends on an older version of HuggingFace Transformers, which is not compatible with the version used in this repository. To experiment with LUKE, please run script ./checkout_out_luke.sh. This will first clone the original LUKE repository, apply the necessary changes to make luke compatible with this repo, and move the LUKE module to the correct place to make sure the code runs correctly.

Dataset

The original FewRel dataset has already be contained in the github repo (here)[./data/fewrel]. To convert other dataset (e.g., TACRED) to the FewRel format, one could use ./scripts/prep_more_data.py.

./scripts/select_rel.py is a script to augment an existing dataset with relations from another dataset. For example, to add a list of relations from dataset source.json to destination.json and dump the merged dataset to a file output.json, one can use the following command:

python scripts/select_rel.py add_rel \
  --src source.json \
  --dst destination.json \
  --output output.json \
  --rels {relations_delimitated_by_space}
Owner
Bloomberg
Bloomberg
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022
pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Python binding for Morfologik Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/

Damian Mirecki 18 Dec 29, 2021
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022
An ActivityWatch watcher to pose questions to the user and record her answers.

aw-watcher-ask An ActivityWatch watcher to pose questions to the user and record her answers. This watcher uses Zenity to present dialog boxes to the

Bernardo Chrispim Baron 33 Dec 03, 2022
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
Write Python in Urdu - اردو میں کوڈ لکھیں

UrduPython Write simple Python in Urdu. How to Use Write Urdu code in سامپل۔پے The mappings are as following: "۔": ".", "،":

Saad A. Bazaz 26 Nov 27, 2022
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
Higher quality textures for the Metal Gear Solid series.

Metal Gear Solid: HD Textures Higher quality textures for the Metal Gear Solid series. The goal is to maximize the quality of assets that the engine w

Samantha 6 Dec 06, 2022
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 05, 2022
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP

Graph4NLP Graph4NLP is an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing (i.e., DLG4NLP).

Graph4AI 1.5k Dec 23, 2022
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to ach

Keon Lee 237 Jan 02, 2023
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

FedNLP is a research-oriented benchmarking framework for advancing federated learning (FL) in natural language processing (NLP). It uses FedML repository as the git submodule. In other words, FedNLP

FedML-AI 216 Nov 27, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

ZYMa 12 Apr 07, 2022
An Open-Source Package for Neural Relation Extraction (NRE)

OpenNRE We have a DEMO website (http://opennre.thunlp.ai/). Try it out! OpenNRE is an open-source and extensible toolkit that provides a unified frame

THUNLP 3.9k Jan 03, 2023