Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Last update: Dec 06, 2022

Related tags

Deep Learning LEBERT

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Arxiv link of the paper: https://arxiv.org/abs/2105.07148

Requirement

Python 3.7.0
Transformer 3.4.0
Numpy 1.18.5
Packaging 17.1
skicit-learn 0.23.2
torch 1.16.0+cu92
tqdm 4.50.2
multiprocess 0.70.10
tensorflow 2.3.1
tensorboardX 2.1
seqeval 1.2.1

Input Format

CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

美   B-LOC  
国   E-LOC  
的   O  
华   B-PER  
莱   I-PER  
士   E-PER  

我   O  
跟   O  
他   O  
谈   O  
笑   O  
风   O  
生   O

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese BERT: https://cdn.huggingface.co/bert-base-chinese-pytorch_model.bin

Chinese word embedding:

Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz

Checkpoints and Shells

Directory Structure of data

berts
- bert
  - config.json
  - vocab.txt
  - pytorch_model.bin
dataset
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table.
embedding
- word_embedding.txt
result
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
log

Run

1.Convert .char.bmes file to .json file, python3 to_json.py
2.run the shell, sh run_ner.sh

If you want to load my checkpoints, you need to make some revisions to your transformers.

My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.

Enter the source code director of Transformer, cd source/transformers-master
Find the modeling_util.py, and positioned to about 995 lines
change the code as follows:
Compile the revised source code and install. python3 setup.py install

Cite

@misc{liu2021lexicon,
      title={Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter}, 
      author={Wei Liu and Xiyan Fu and Yue Zhang and Wenming Xiao},
      year={2021},
      eprint={2105.07148},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Related tags

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Requirement

Input Format

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese word embedding:

Checkpoints and Shells

Directory Structure of data

Run

If you want to load my checkpoints, you need to make some revisions to your transformers.

Cite

Owner

Python framework for Stochastic Differential Equations modeling

Code for "Diversity can be Transferred: Output Diversification for White- and Black-box Attacks"

A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Kinetics-Data-Preprocessing

Multi-objective constrained optimization for energy applications via tree ensembles

Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".

Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

AI-generated-characters for Learning and Wellbeing

FishNet: One Stage to Detect, Segmentation and Pose Estimation

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Towards End-to-end Video-based Eye Tracking

Reinforcement learning algorithms in RLlib

CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

Hard cater examples from Hopper ICLR paper