The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Last update: Nov 12, 2022

Overview

Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao. Noisy-Labeled NER with Confidence Estimation. NAACL 2021. [arxiv]

Requirements

pip install -r requirements.txt

Data

The format of datasets includes three columns, the first column is word, the second column is noisy labels and the third column is gold labels. For datasets without golden labels, you could set the third column the same as the second column. We provide the CoNLL 2003 English with recall 0.5 and precision 0.9 in './data/eng_r0.5p0.9'

Confidence Estimation Strategies

Local Strategy

python confidence_estimation_local.py --dataset eng_r0.5p0.9 --embedding_file ${PATH_TO_EMBEDDING} --embedding_dim ${DIM_OF_EMBEDDING} --neg_noise_rate ${NOISE_RATE_OF_NEGATIVES} --pos_noise_rate ${NOISE_RATE_OF_POSITIVES}

For '--neg_noise_rate' and '--pos_noise_rate', you can set them as -1.0 to use golden noise rate (experiment 12 in Table 1 For En), or you can set them as other values (i.e., --neg_noise_rate 0.09 --pos_noise_rate 0.14 for experiment 10, En)

Global Strategy

python confidence_estimation_global.py --dataset eng_r0.5p0.9 --embedding_file ${PATH_TO_EMBEDDING} --embedding_dim ${DIM_OF_EMBEDDING} --neg_noise_rate ${NOISE_RATE_OF_NEGATIVES} --pos_noise_rate ${NOISE_RATE_OF_POSITIVES}

For 'neg_noise_rate' and 'pos_noise_rate', you can set them as -1.0 to use golden noise rate (experiment 13 in Table 1 for En), or you can set them as other values (i.e., --neg_noise_rate 0.1 --pos_noise_rate 0.13 for experiment 11, En)

Key Implementation

equation (3) is implemented in ./model/linear_partial_crf_inferencer.py, line 79-85.

equation (4) is implemented in ./model/neuralcrf_small_loss_constrain_local.py, line 139.

equation (5) is implemented in ./confidence_estimation_local.py, line 74-87 or ./confidence_estimation_global.py, line 75-85.

equation (6) and (7) are implemented in ./model/neuralcrf_small_loss_constrain_global.py, line 188-194 or ./model/neuralcrf_small_loss_constrain_local.py, line 188-197.

For global strategy, equation (8) is implemented in ./model/neuralcrf_small_loss_constrain_global.py, line 195-214 and ./model/linear_partial_crf_inferencer.py, line 36-48. For local strategy, equation (8) is implemented in ./model/neuralcrf_small_loss_constrain_local.py, line 198-215 and ./model/linear_crf_inferencer.py, line 36-48.

The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Related tags

Overview

Requirements

Data

Confidence Estimation Strategies

Local Strategy

Global Strategy

Key Implementation

Owner

This is an official pytorch implementation of Fast Fourier Convolution.

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Example of semantic segmentation in Keras

Morphable Detector for Object Detection on Demand

This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

Deep Learning Based Fasion Recommendation System for Ecommerce

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Label Mask for Multi-label Classification

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Code for generating a single image pretraining dataset

Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

基于PaddleClas实现垃圾分类，并转换为inference格式用PaddleHub服务端部署

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Eff video representation - Efficient video representation through neural fields

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

ArtEmis: Affective Language for Art