Skip to content

jcrossyale/biobert_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Toward Model Interpretability in Medical NLP: A Systematic Analysis of BioBERT

LING380: Topics in Computational Linguistics Final Project

James Cross (j.cross@yale.edu) and Daniel Kim (d.j.kim@yale.edu), December 2021

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook

Model evaluation/analysis notebook

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks on your local machine. Directory paths will have to be adjusted as needed.

Code Organization

  • data: contains medical transcriptions and other medical report data used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation
  • notebooks:
  • model_training.ipynb: code to train and fine-tune BERT and BioBERT
  • model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses
  • scripts: same functionality as in the notebooks, in executable python scripts / functions
  • models: where checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization are stored

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

About

Training and computational/visualization analysis of BERT and BioBERT using PyTorch and huggingface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages