NLP ROAR Interpretability

Official code for: Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

Install

git clone https://github.com/AndreasMadsen/nlp-roar-interpretability.git
cd nlp-roar-interpretability
python -m pip install -e .

Experiments

Tasks

There are scripts for each dataset. Note that some tasks share a dataset. Use this list to identify how to train a model for each task.

SST: python experiments/stanford_sentiment.py
SNLI: python experiments/stanford_nli.py
IMDB: python experiments/imdb.py
MIMIC (Diabetes): python experiments/mimic.py --subset diabetes
MIMIC (Anemia): python experiments/mimic.py --subset anemia
bABI-1: python experiments/babi.py --task 1
bABI-2: python experiments/babi.py --task 2
bABI-3: python experiments/babi.py --task 3

In addition to the tasks, the synthetic experiment can created with python experiments/synthetic.py.

Parameters

Each of the above scripts stanford_sentiment, stanford_nli, imdb, mimic, and babi take the same set of CLI arguments. You can learn about each argument with --help. The most important arguments which will allow you to run the experiments presented in the paper are:

--importance-measure: this specifies which importance measure is used. It can be either random, mutual-information, attention , gradient, or integrated-gradient.
--seed: specifies the seed used to initialize the model.
--roar-strategy: should ROAR masking be done absoloute (count) or relative (quantile),
--k: the proportion of tokens in % to mask if --roar-strategy quantile is used. The number of tokens if --roar-strategy count is used.
--recursive: indicates that model to use for computing the importance measure has --k set to --k - --recursive-step-size instead of 0 as used in classic ROAR.
--model-type indicates which models to used. Can be either rnn for the BiLSTM-Attention model or roberta for the RoBERTa-base model.

Note, for --k > 0, the reference model must already be trained. For example, in the non-recursive case, this means that a model trained with --k 0 must already available.

Running on a HPC setup

For downloading dataset dependencies we provide a download.sh script.

Additionally, we provide script for submitting all jobs to a Slurm queue, in batch_jobs/. Note again, that the ROAR script assume there are checkpoints for the baseline --k 0 models.

The jobs automatically use $SCRATCH/nlproar as the presistent dir.

MIMIC

See https://mimic.physionet.org/gettingstarted/access/ for how to access MIMIC. You will need to download DIAGNOSES_ICD.csv.gz and NOTEEVENTS.csv.gz and place them in mimic/ relative to your presistent dir.

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
batch_jobs		batch_jobs
drawings		drawings
experiments		experiments
export		export
nlproar		nlproar
poster		poster
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
download.sh		download.sh
interactive.sh		interactive.sh
pack_scratch.sh		pack_scratch.sh
python_beluga_cpu_job.sh		python_beluga_cpu_job.sh
python_beluga_gpu_job.sh		python_beluga_gpu_job.sh
python_cedar_gpu_job.sh		python_cedar_gpu_job.sh
python_graham_gpu_job.sh		python_graham_gpu_job.sh
python_narval_cpu_job.sh		python_narval_cpu_job.sh
python_narval_gpu_job.sh		python_narval_gpu_job.sh
roarplot.png		roarplot.png
setup.py		setup.py
unpack_scratch.sh		unpack_scratch.sh

License

AndreasMadsen/nlp-roar-interpretability

Folders and files

Latest commit

History

Repository files navigation

NLP ROAR Interpretability

Install

Experiments

Tasks

Parameters

Running on a HPC setup

MIMIC

About

Resources

License

Stars

Watchers

Forks

Languages