A text augmentation tool for named entity recognition.

Last update: Oct 11, 2022

Overview

neraug

This python library helps you with augmenting text data for named entity recognition.

Augmentation Example

Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition

Installation

To install the library:

pip install neraug

Usage

One of the example algorithms: DictionaryReplacement:

>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES

>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)   
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]

The library supports the following algorithms:

DictionaryReplacement
LabelWiseTokenReplacement
MentionReplacement
ShuffleWithinSegment

and supports the following scheme:

IOB2
IOBES
BILOU

Reference

Appreciate for the following research:

An Analysis of Simple Data Augmentation for Named Entity Recognition

Citation

@misc{neraug,
  title={neraug: A data augmentation tool for named entity recognition},
  author={Hiroki Nakayama},
  url={https://github.com/Hironsan/neraug},
  year={2021}
}

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

1.1k Dec 25, 2022

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

14 Nov 15, 2022

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

45 Nov 29, 2022

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

Remove tokenizer from MentionReplacement
Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 22, 2021)

Source code(tar.gz)
Source code(zip)

A text augmentation tool for named entity recognition.

Related tags

Overview

neraug

Augmentation Example

Installation

Usage

Reference

Citation

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Named Entity Recognition API used by TEI Publisher

Nested Named Entity Recognition

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

v0.1.0(Jul 22, 2021)

Owner

Hiroki Nakayama

A programming language with logic of Python, and syntax of all languages.

TruthfulQA: Measuring How Models Imitate Human Falsehoods

Russian words synonyms and antonyms

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Mkdocs + material + cool stuff

Utilities for preprocessing text for deep learning with Keras

Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Perform sentiment analysis and keyword extraction on Craigslist listings

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Telegram AI chat bot written in Python using Pyrogram

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

History Aware Multimodal Transformer for Vision-and-Language Navigation

基于pytorch_rnn的古诗词生成

NLPShala , the best IDE for all Natural language processing tasks.

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

ReCoin - Restoring our environment and businesses in parallel

Shellcode antivirus evasion framework