CoNLL-English NER Task

en | ch

Motivation

Course Project
review the pytorch framework and sequence-labeling task
practice using the transformers of Huggingface

Dataset Introduction

A train set, a test set and a validation set in the data file

-DOCSTART- -X- O O
-sentnce- -pos- -Chuck- -Entity-

Project Structure

-data  # source data
-emb # BERT model files

-util
    -dataTool.py  # data interface
    -model.py
    -trainer.py  # train and evaluate

config.py  # parameters in the project
run.py
requirement.txt

EDA.ipynb # exploratory data analasis, 
          # which aims to confirm the hyper-params in the trials

Coding Pattern

For keeping the convenience and simplicity of experiments,
decouple the model into two units: encoder and tagger

model ==> encoder + tagger

In such a way, encoder extracts the context and linguistit features,
which will be received by tagger to output BIO tags.

Usage

chmod 755 deploy
./deploy

./gpu n  # monitor the GPU (refresh every n seconds)
./run  # start

Baseline Performance (1 ep | macro)

Model	Precision	Recall	F1
Bert-CRF	0.71	0.68	0.69
Bert-softmax	-	-	-
Bert-BiLSTM-CRF	-	-	-
Bert-BiLSTM-softmax	-	-	-

Optimization

cost sensitive learning or drop the few classes
dropout to improve the generalization performance
different backbone structures
DDP training --> large GPU caches for a large batch_size
more epochs --> schedule the learning rate dynamically while training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

util

util

EDA.ipynb

EDA.ipynb

README.md

README.md

README_ch.md

README_ch.md

config.py

config.py

deploy

deploy

gpu

gpu

requirements.txt

requirements.txt

run

run

run.py

run.py

Repository files navigation

CoNLL-English NER Task

Motivation

Dataset Introduction

Project Structure

Coding Pattern

Usage

Baseline Performance (1 ep | macro)

Optimization

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
util		util
EDA.ipynb		EDA.ipynb
README.md		README.md
README_ch.md		README_ch.md
config.py		config.py
deploy		deploy
gpu		gpu
requirements.txt		requirements.txt
run		run
run.py		run.py

Hipkevin/CoNLL-NER

Folders and files

Latest commit

History

Repository files navigation

CoNLL-English NER Task

Motivation

Dataset Introduction

Project Structure

Coding Pattern

Usage

Baseline Performance (1 ep | macro)

Optimization

About

Resources

Stars

Watchers

Forks

Languages