This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Last update: Dec 27, 2022

Related tags

Overview

Word-Level Coreference Resolution

This is a repository with the code to reproduce the experiments described in the paper of the same name, which was accepted to EMNLP 2021. The paper is available here.

Preparation
Training
Evaluation

Preparation

The following instruction has been tested with Python 3.7 on an Ubuntu 20.04 machine.

You will need:

OntoNotes 5.0 corpus (download here, registration needed)
Python 2.7 to run conll-2012 scripts
Java runtime to run Stanford Parser
Python 3.7+ to run the model
Perl to run conll-2012 evaluation scripts
CUDA-enabled machine (48 GB to train, 4 GB to evaluate)

Extract OntoNotes 5.0 arhive. In case it's in the repo's root directory:
```
 tar -xzvf ontonotes-release-5.0_LDC2013T19.tgz
```
Switch to Python 2.7 environment (where python would run 2.7 version). This is necessary for conll scripts to run correctly. To do it with with conda:
```
 conda create -y --name py27 python=2.7 && conda activate py27
```

Run the conll data preparation scripts (~30min):

 sh get_conll_data.sh ontonotes-release-5.0 data

Download conll scorers and Stanford Parser:
```
 sh get_third_party.sh
```

Prepare your environment. To do it with conda:

 conda create -y --name wl-coref python=3.7 openjdk perl
 conda activate wl-coref
 python -m pip install -r requirements.txt

Build the corpus in jsonlines format (~20 min):

 python convert_to_jsonlines.py data/conll-2012/ --out-dir data
 python convert_to_heads.py

You're all set!

Training

If you have completed all the steps in the previous section, then just run:

python run.py train roberta

Use -h flag for more parameters and CUDA_VISIBLE_DEVICES environment variable to limit the cuda devices visible to the script. Refer to config.toml to modify existing model configurations or create your own.

Evaluation

Make sure that you have successfully completed all steps of the Preparation section.

Download and save the pretrained model to the data directory.

 https://www.dropbox.com/s/vf7zadyksgj40zu/roberta_%28e20_2021.05.02_01.16%29_release.pt?dl=0

Generate the conll-formatted output:

 python run.py eval roberta --data-split test

Run the conll-2012 scripts to obtain the metrics:
```
 python calculate_conll.py roberta test 20
```

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Related tags

Overview

Word-Level Coreference Resolution

Table of contents

Preparation

Training

Evaluation

Owner

code for modular summarization work published in ACL2021 by Krishna et al

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Share constant definitions between programming languages and make your constants constant again

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

AudioCLIP Extending CLIP to Image, Text and Audio

A desktop GUI providing an audio interface for GPT3.

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

DLO8012: Natural Language Processing & CSL804: Computational Lab - II

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

A minimal Conformer ASR implementation adapted from ESPnet.

Random Directed Acyclic Graph Generator

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Yet another Python binding for fastText

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

AllenNLP integration for Shiba: Japanese CANINE model

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!