Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Guide to using pre-trained large language models of source code

Experiments in converting wikidata to ftm

Code for evaluating Japanese pretrained models provided by NTT Ltd.

A multi-voice TTS system trained with an emphasis on quality

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

Optimal Transport Tools (OTT), A toolbox for all things Wasserstein.

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Help you discover excellent English projects and get rid of disturbing by other spoken language

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Shared code for training sentence embeddings with Flax / JAX

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Natural Language Processing Specialization

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Awesome Treasure of Transformers Models Collection

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

基于Transformer的单模型、多尺度的VAE模型

Fidibo.com comments Sentiment Analyser

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.