A text augmentation tool for named entity recognition.

Overview

neraug

This python library helps you with augmenting text data for named entity recognition.

Augmentation Example


Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition

Installation

To install the library:

pip install neraug

Usage

One of the example algorithms: DictionaryReplacement:

>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES

>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)   
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]

The library supports the following algorithms:

  • DictionaryReplacement
  • LabelWiseTokenReplacement
  • MentionReplacement
  • ShuffleWithinSegment

and supports the following scheme:

  • IOB2
  • IOBES
  • BILOU

Reference

Appreciate for the following research:

Citation

@misc{neraug,
  title={neraug: A data augmentation tool for named entity recognition},
  author={Hiroki Nakayama},
  url={https://github.com/Hironsan/neraug},
  year={2021}
}
You might also like...
Pytorch-Named-Entity-Recognition-with-BERT
Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition (CoNLL-2003

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Code for
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Releases(v0.1.1)
Owner
Hiroki Nakayama
OSS developer | Interested in Natural Language Processing
Hiroki Nakayama
A programming language with logic of Python, and syntax of all languages.

Pytov The idea was to take all well known syntaxes, and combine them into one programming language with many posabilities. Installation Install using

Yuval Rosen 14 Dec 07, 2022
TruthfulQA: Measuring How Models Imitate Human Falsehoods

TruthfulQA: Measuring How Models Imitate Human Falsehoods

69 Dec 25, 2022
Russian words synonyms and antonyms

ru_synonyms Russian words synonyms and antonyms. Install pip install git+https://github.com/ahmados/rusynonyms.git Usage from ru_synonyms import Anto

sumekenov 7 Dec 14, 2022
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t

EleutherAI 6.7k Dec 28, 2022
Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

Francesco Saverio Zuppichini 61 Oct 26, 2022
Utilities for preprocessing text for deep learning with Keras

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process

Hamel Husain 180 Dec 09, 2022
Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is one of the deep-learning NLP(Natural Language Process) model struc

RUO 2 Feb 22, 2022
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank li

Tae-Hwan Jung 11.9k Jan 08, 2023
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

🤗 🖼️ HuggingPics Fine-tune Vision Transformers for anything using images found on the web. Check out the video below for a walkthrough of this proje

Nathan Raw 185 Dec 21, 2022
Telegram AI chat bot written in Python using Pyrogram

Aurora_Al Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @AuroraAl. Require

♗CσNϙUҽRσR_MҽSƙEƚҽҽR 1 Oct 31, 2021
Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

epub2audiobook Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech Input examples qual a pasta do seu

7 Aug 25, 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成 说明 config.py里面含有训练、测试、预测的参数,更改后运行: python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

西西嘛呦 3 May 26, 2022
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 08, 2021
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

Akil Selvan Rajendra Janarthanan 3 Mar 02, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022