Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Nateve compiler developed with python.

A workshop with several modules to help learn Feast, an open-source feature store

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Fine-tune GPT-3 with a Google Chat conversation history

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

A programming language with logic of Python, and syntax of all languages.

Two-stage text summarization with BERT and BART

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Predict the spans of toxic posts that were responsible for the toxic label of the posts

An evaluation toolkit for voice conversion models.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

A highly sophisticated sequence-to-sequence model for code generation

A Flask Sentiment Analysis API, with visual implementation

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

PG-19 Language Modelling Benchmark

Various capabilities for static malware analysis.

Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type