Chinese clinical named entity recognition using pre-trained BERT model

Last update: Dec 14, 2022

Related tags

Deep Learning bertcner

Overview

Chinese clinical named entity recognition (CNER) using pre-trained BERT model

Introduction

Code for paper Chinese clinical named entity recognition with variant neural structures based on BERT methods

Paper url: https://www.sciencedirect.com/science/article/pii/S1532046420300502

We pre-trained BERT model to improve the performance of Chinese CNER. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) were used to extract the text features and decode the predicted tags respectively. And we also proposed a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters were also used to improve the model performance.

Model structure

Usage

Pre-trained models

For replication, we uploaded two models in Baidu Netdisk.

Link: https://pan.baidu.com/s/1obzG6OSbu77duhusWg2xmQ Code: k53q

Examples

To replicate the result of CCKS-2018 dataset

python main.py \
--data_dir=data/ccks_2018 \
--bert_model=model/  \
--output_dir=./output  \
--terminology_dicts_path="{'medicine':'data/ccks_2018/drug_dict.txt','surgery':'data/ccks_2018/surgery_dict.txt'}" \
--radical_dict_path data/radical_dict.txt \
--constant=0 \
--add_radical_or_not=True \
--radical_one_hot=False \
--radical_emb_dim=20 \
--max_seq_length=480 \
--do_train=True \
--do_eval=True \
--train_batch_size=6 \
--eval_batch_size=4 \
--hidden_dim=64 \
--learning_rate=5e-5 \
--num_train_epochs=5 \
--gpu_id=3 \

Results

CCKS-2018 dataset

Method	P	R	F1
FT-BERT+BiLSTM+CRF	88.57	89.02	88.80
+dictionary	88.58	89.17	88.87
+radical(one-hot encoding)	88.51	89.39	88.95
+radical(random embedding)	89.24	89.11	89.17
+dictionary +radical	89.42	89.22	89.32
ensemble	89.59	89.54	89.56

Team Name	Method	F1
Yang and Huang (2018)	CRF(feature-rich + rule)	89.26
heiheihahei	LSTM-CRF(ensemble)	88.92
Luo et al.(2018)	LSTM-CRF(ensemble)	88.63
dous12	-	88.37
chengachengcheng	-	88.30
NUBT-IBDL	-	87.62
Our	FT-BERT+BiLSTM +CRF+Dictionary(ensemble)	89.56

CCKS-2017 dataset

Method	P	R	F1
FT-BERT+BiLSTM+CRF	91.64	90.98	91.31
+dictionary	91.49	90.97	91.23
+radical(one-hot encoding)	91.83	90.80	91.35
+radical(random embedding)	92.07	90.77	91.42
+dictionary+radical	91.76	90.88	91.32
ensemble	92.06	91.15	91.60

Team Name	Method	F1
Qiu et al. (2018b)	RD-CNN-CRF	91.32
Wang et al. (2019)	BiLSTM-CRF+Dictionary	91.24
Hu et al. (2017)	BiLSTM-FEA(ensemble)	91.03
Zhang et al. (2018)	BiLSTM-CRF(mt+att+ms)	90.52
Xia and Wang (2017)	BiLSTM-CRF(ensemble)	89.88
Ouyang et al. (2017)	BiRNN-CRF	88.85
Li et al. (2017)	BiLSTM-CRF(specialized +lexicons)	87.95
Our	FT-BERT+BiLSTM +CRF+Dictionary(ensemble)	91.60

Chinese clinical named entity recognition using pre-trained BERT model

Related tags

Overview

Chinese clinical named entity recognition (CNER) using pre-trained BERT model

Introduction

Model structure

Usage

Pre-trained models

Examples

Results

CCKS-2018 dataset

CCKS-2017 dataset

Owner

Xiangyang Li

Code for the paper Hybrid Spectrogram and Waveform Source Separation

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

Official code for "InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization" (ICLR 2020, spotlight)

Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation.

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

Pytoydl: A toy deep learning framework built upon numpy.

Protect against subdomain takeover

TensorFlow Implementation of "Show, Attend and Tell"

Rl-quickstart - Reinforcement Learning Quickstart