ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Related tags

Deep Learninganea
Overview

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA is a tool to automatically annotate named entities in unlabeled text based on entity lists for the use as distant supervision.

Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to gather. ANEA is a tool to automatically annotate named entities in texts based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without labelling or checking all tokens manually.

An example of the workflow can be seen in this video. For more details, take a look at our paper (accepted at PML4DC @ ICLR'21). For the additional material of the paper, please check the subdirectory additional of this repository.

Installation

ANEA should run on all major operating systems. We recommend the installation via conda or miniconda:

git clone https://github.com/uds-lsv/anea

conda create -n anea python=3.7
conda activate anea
pip install spacy==2.2.4 Flask==1.1.1 fuzzywuzzy==0.18.0

For tokenizationa and lemmatization, a spacy language pack needs to be installed. Run the following command with the corresponding language code, e.g. en for English. Check https://spacy.io/usage for supported languages

python -m spacy download en

Download the Wikidata JSON dump from https://dumps.wikimedia.org/wikidatawiki/entities/ and extract it to the instance directory (this may take a while).

Running

After the installation, you can run ANEA using the following commands on the command line

conda activate anea
./run.sh

Then open the browser and go to the address http://localhost:5000/ If you run it for the first time, you should configure ANEA at the Settings tab.

The ANEA (server) tool can run on a different machine than the browser of the user. It is just necessary that the user's computer can access the port 5000 on the machine that the ANEA server is running on (e.g. via ssh port forwarding or opening the correspoding port on the firewall).

Support for Other Languages

ANEA uses Spacy for language preprocessing (tokenization and lemmatization). It currently supports English, German, French, Spanish, Portuguese, Italian, Dutch, Greek, Norwegian Bokmål and Lithuanian. For Estonian, EstNLTK, version 1.6, is supported by ANEA. In that case, ANEA needs to be installed with Python 3.6.

Text can also be preprocessed using external tools and then uploaded as whitespace tokenized text or in the CoNLL format (one token per line).

Other external preprocessing libraries can be added directly to ANEA by implementing a new Tokenizer class in autom_labeling_library/preprocessing.py (you can take a look at EstnltkTokenizer as an example) and adding it to the Preprocessing class. If you encounter any issues, just contact us.

Citation

If you use this tool, please cite us:

@article{hedderich21ANEA,
  author    = {Michael A. Hedderich and
               Lukas Lange and
               Dietrich Klakow},
  title     = {{ANEA:} Distant Supervision for Low-Resource Named Entity Recognition},
  journal   = {CoRR},
  volume    = {abs/2102.13129},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.13129},
  archivePrefix = {arXiv},
  eprint    = {2102.13129},
}

Development, Support & License

If you encounter any issues or problems when using ANEA, feel free to raise an issue on Github or contact us directly (mhedderich [at] lsv.uni-saarland [dot] de). We welcome contributes from other developers.

ANEA is licensed under the Apache License 2.0.

Owner
Saarland University Spoken Language Systems Group
Saarland University Spoken Language Systems Group
Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

AAVAE Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders" Abstract Recent methods for self-supervised learnin

Grid AI Labs 48 Dec 12, 2022
A Python library for generating new text from existing samples.

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birt

8 May 17, 2022
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

41 May 18, 2022
Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. ⏬

Shixiang Wang 15 May 23, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

Sean Naren 2k Jan 04, 2023
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

Geometry-Aware Learning of Maps for Camera Localization This is the PyTorch implementation of our CVPR 2018 paper "Geometry-Aware Learning of Maps for

NVIDIA Research Projects 321 Nov 26, 2022
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Arun Verma 1 Nov 09, 2021
GeoTransformer - Geometric Transformer for Fast and Robust Point Cloud Registration

Geometric Transformer for Fast and Robust Point Cloud Registration PyTorch imple

Zheng Qin 220 Jan 05, 2023
Vision Deep-Learning using Tensorflow, Keras.

Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n

kimminjun 6 Dec 14, 2022
g9.py - Torch interactive graphics

g9.py - Torch interactive graphics A Torch toy in the browser. Demo at https://srush.github.io/g9py/ This is a shameless copy of g9.js, written in Pyt

Sasha Rush 13 Nov 16, 2022
Husein pet projects in here!

project-suka-suka Husein pet projects in here! List of projects mysejahtera-density. Generate resolution points using meshgrid and request each points

HUSEIN ZOLKEPLI 47 Dec 09, 2022
A PyTorch Implementation of Neural IMage Assessment

NIMA: Neural IMage Assessment This is a PyTorch implementation of the paper NIMA: Neural IMage Assessment (accepted at IEEE Transactions on Image Proc

yunxiaos 418 Dec 29, 2022
CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Frederick Wang 3 Apr 26, 2022
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 01, 2023
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022
Ensemble Visual-Inertial Odometry (EnVIO)

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 95 Jan 03, 2023
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
FaceAPI: AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using TensorFlow/JS

FaceAPI AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using

Vladimir Mandic 395 Dec 29, 2022