Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Last update: Dec 29, 2022

Related tags

Text Data & NLP NegSampling-NER

Overview

Negative Sampling for NER

Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes using negative sampling for solving this important issue. This repo. contains the implementation of our approach.

Note that this is not an officially supported Tencent product.

Preparation

Two steps. Firstly, reformulate the NER data and move it into a new folder named "dataset". The folder contains {train, dev, test}.json. Each JSON file is a list of dicts. See the following case:

[ 
 {
  "sentence": "['Somerset', '83', 'and', '174', '(', 'P.', 'Simmons', '4-38', ')', ',', 'Leicestershire', '296', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (5, 6, 'PER'), (10, 10, 'ORG')]",
 },
 {
  "sentence": "['Leicestershire', '22', 'points', ',', 'Somerset', '4', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (4, 4, 'ORG')]",
 }
]

Secondly, pretrained LM (i.e., BERT) and eval. script. Create a dir. named "resource" and arrange them as

resource
- bert-base-cased
  - model.pt
  - vocab.txt
- conlleval.pl

Note that the files in BERT.tar.gz need to be renamed as above.

Training and Test

CUDA_VISIBLE_DEVICES=0 python main.py -dd dataset -cd save -rd resource

Citation

@inproceedings{li2021empirical,
    title={Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition},
    author={Yangming Li and lemao liu and Shuming Shi},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=5jRVa89sZk}
}

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Related tags

Overview

Negative Sampling for NER

Preparation

Training and Test

Citation

Owner

Yangming Li

PIZZA - a task-oriented semantic parsing dataset

SimCSE: Simple Contrastive Learning of Sentence Embeddings

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

Open source annotation tool for machine learning practitioners.

Snips Python library to extract meaning from text

Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

It analyze the sentiment of the user, whether it is postive or negative.

Weird Sort-and-Compress Thing

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Shared, streaming Python dict

Transformer related optimization, including BERT, GPT

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

MPNet: Masked and Permuted Pre-training for Language Understanding

Natural Language Processing with transformers

Easy to start. Use deep nerual network to predict the sentiment of movie review.

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

List of GSoC organisations with number of times they have been selected.

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)