Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

GPT-3 command line interaction

基于Transformer的单模型、多尺度的VAE模型

A NLP program: tokenize method, PoS Tagging with deep learning

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Text Analysis & Topic Extraction on Android App user reviews

nlpcommon is a python Open Source Toolkit for text classification.

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Scene Text Retrieval via Joint Text Detection and Similarity Learning

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Natural Language Processing Tasks and Examples.

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Translates basic English sentences into the Huna language (hoo-NAH)

📔️ Generate a text-based journal from a template file.

Searching keywords in PDF file folders

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)