CoNLL-English NER Task (NER in English)

Overview

CoNLL-English NER Task

en | ch

Motivation

Course Project
review the pytorch framework and sequence-labeling task
practice using the transformers of Huggingface

Dataset Introduction

A train set, a test set and a validation set in the data file

-DOCSTART- -X- O O
-sentnce- -pos- -Chuck- -Entity-

Project Structure

-data  # source data
-emb # BERT model files

-util
    -dataTool.py  # data interface
    -model.py
    -trainer.py  # train and evaluate

config.py  # parameters in the project
run.py
requirement.txt

EDA.ipynb # exploratory data analasis, 
          # which aims to confirm the hyper-params in the trials

Coding Pattern

For keeping the convenience and simplicity of experiments,
decouple the model into two units: encoder and tagger

model ==> encoder + tagger

In such a way, encoder extracts the context and linguistit features,
which will be received by tagger to output BIO tags.

Usage

chmod 755 deploy
./deploy

./gpu n  # monitor the GPU (refresh every n seconds)
./run  # start

Baseline Performance (1 ep | macro)

Model Precision Recall F1
Bert-CRF 0.71 0.68 0.69
Bert-softmax - - -
Bert-BiLSTM-CRF - - -
Bert-BiLSTM-softmax - - -

Optimization

  • cost sensitive learning or drop the few classes
  • dropout to improve the generalization performance
  • different backbone structures
  • DDP training --> large GPU caches for a large batch_size
  • more epochs --> schedule the learning rate dynamically while training
Owner
Kevin
hhhhhhhhhhh
Kevin
SurvTRACE: Transformers for Survival Analysis with Competing Events

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events This repo provides the implementation of SurvTRACE for survival analysis. It is

Zifeng 13 Oct 06, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。 声明 本项目是为方便个人工作所创建的,仅有部分代码原创。

Ming 30 Dec 02, 2022
Khandakar Muhtasim Ferdous Ruhan 1 Dec 30, 2021
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Won Joon Yoo 335 Jan 04, 2023
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

Princeton Natural Language Processing 92 Dec 27, 2022
Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Applied Natural Language Processing in the Enterprise This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reill

Applied Natural Language Processing in the Enterprise 95 Jan 05, 2023
MMDA - multimodal document analysis

MMDA - multimodal document analysis

AI2 75 Jan 04, 2023
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group 39 Nov 15, 2022
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 08, 2022
Snips Python library to extract meaning from text

Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur

Snips 3.7k Dec 30, 2022
We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Voice Based Personal Assistant We have built a Voice based Personal Assistant for people to access files hands free in their device using natural lang

Rushabh 2 Nov 13, 2021
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022