The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.

Overview

LM-Debugger is an open-source interactive tool for inspection and intervention in transformer-based language models. This repository includes the code and links for data files required for running LM-Debugger over GPT2 Large and GPT2 Medium. Adapting this tool to other models only requires changing the backend API (see details below). Contributions our welcome!

An online demo of LM-Debugger is available at:

For more details, please check our paper: "LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models".

⚙️ Requirements

LM-Debugger has two main views for (a) debugging and intervention in model predictions, and (b) exploration of information encoded in the model's feed-forward layers.

The tool runs in a React and python environment with Flask and Streamlit installed. In addition, the exploration view uses an Elasticsearch index. To set up the environment, please follow the steps below:

  1. Clone this repository:

    git clone https://github.com/mega002/lm-debugger
    cd lm-debugger
  2. Create a Python 3.8 environment, and install the following dependencies:

    pip install -r requirements.txt
  3. Install Yarn and NVM, and set up the React environment:

    cd ui
    nvm install
    yarn install
    cd ..
  4. Install Elasticsearch and make sure that the service is up.

🔎 Running LM-Debugger

Creating a Configuration File

LM-Debugger executes one model at a time, based on a given configuration file. The configuration includes IP addresses and port numbers for running the different services, as well as the following fields:

  • model_name: The current version of LM-Debugger supports GPT2 models from HuggingFace (e.g. gpt2-medium or gpt2-large).
  • server_files_dir: A path to store files with preprocessed model information, created by the script create_offline_files.py. The script creates 3 pickle files with (1) projections to the vocabulary of parameter vectors of the model's feed-forward layers, (2) two separate files with mappings between parameter vectors and clusters (and vice versa).
  • create_cluster_files: A boolean field (true/false) that indicates whether to run clustering or not. This is optional since clustering of the feed-forward parameter vectors can take several hours and might require extra computation resources (especially for large models).

Sample configuration files for the medium and large versions of GPT2 are provided in the config_files directory. The preprocessed data files for these models are available for download here.

Creating an Elasticsearch Index

The keyword search functionality in the exploration view is powered by an Elasticsearch index that stores the projections of feed-forward parameter vectors from the entire network. To create this index, run:

python es_index/index_value_projections_docs.py \
--config_path CONFIG_PATH

Executing LM-Debugger

To run LM-Debugger:

bash start.sh CONFIG_PATH

In case you are interested in running only one of the two views of LM-Debugger, this can be done as follows:

  1. To run the Flask server (needed for the prediction view):

    python flask_server/app.py --config_path CONFIG_PATH
  2. To run the prediction view:

    python ui/src/convert2runConfig.py --config_path CONFIG_PATH
    cd ui
    yarn start
  3. To run the exploration view:

    streamlit run streamlit/exploration.py -- --config_path CONFIG_PATH

Citation

Please cite as:

@article{geva2022lmdebugger,
  title={LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models},
  author={Geva, Mor and Caciularu, Avi and Dar, Guy and Roit, Paul and Sadde, Shoval and Shlain, Micah and Tamir, Bar and Goldberg, Yoav},
  journal={arXiv preprint arXiv:2204.12130},
  year={2022}
}
Owner
Mor Geva
Mor Geva
Inject code into running Python processes

pyrasite Tools for injecting arbitrary code into running Python processes. homepage: http://pyrasite.com documentation: http://pyrasite.rtfd.org downl

Luke Macken 2.7k Jan 08, 2023
Django package to log request values such as device, IP address, user CPU time, system CPU time, No of queries, SQL time, no of cache calls, missing, setting data cache calls for a particular URL with a basic UI.

django-web-profiler's documentation: Introduction: django-web-profiler is a django profiling tool which logs, stores debug toolbar statistics and also

MicroPyramid 77 Oct 29, 2022
Monitor Memory usage of Python code

Memory Profiler This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for pyth

Fabian Pedregosa 80 Nov 18, 2022
NoPdb: Non-interactive Python Debugger

NoPdb: Non-interactive Python Debugger Installation: pip install nopdb Docs: https://nopdb.readthedocs.io/ NoPdb is a programmatic (non-interactive) d

Ondřej Cífka 67 Oct 15, 2022
Little helper to run Steam apps under Proton with a GDB debugger

protongdb A small little helper for running games with Proton and debugging with GDB Requirements At least Python 3.5 protontricks pip package and its

Joshie 21 Nov 27, 2022
Pyinstrument - a Python profiler. A profiler is a tool to help you optimize your code - make it faster.

Pyinstrument🚴 Call stack profiler for Python. Shows you why your code is slow!

Joe Rickerby 5k Jan 08, 2023
Trace any Python program, anywhere!

lptrace lptrace is strace for Python programs. It lets you see in real-time what functions a Python program is running. It's particularly useful to de

Karim Hamidou 687 Nov 20, 2022
Integration of IPython pdb

IPython pdb Use ipdb exports functions to access the IPython debugger, which features tab completion, syntax highlighting, better tracebacks, better i

Godefroid Chapelle 1.7k Jan 07, 2023
Tracing instruction in lldb debugger.Just a python-script for lldb.

lldb-trace Tracing instruction in lldb debugger. just a python-script for lldb. How to use it? Break at an address where you want to begin tracing. Im

156 Jan 01, 2023
GEF (GDB Enhanced Features) - a modern experience for GDB with advanced debugging features for exploit developers & reverse engineers ☢

GEF (GDB Enhanced Features) - a modern experience for GDB with advanced debugging features for exploit developers & reverse engineers ☢

hugsy 5.2k Jan 01, 2023
Hypothesis debugging with vscode

Hypothesis debugging with vscode

Oliver Mannion 0 Feb 09, 2022
Arghonaut is an interactive interpreter, visualizer, and debugger for Argh! and Aargh!

Arghonaut Arghonaut is an interactive interpreter, visualizer, and debugger for Argh! and Aargh!, which are Befunge-like esoteric programming language

Aaron Friesen 2 Dec 10, 2021
Sane color handling of osx's accent and highlight color from the commandline

osx-colors Sane command line color customisation for osx, no more fiddling about with defaults, internal apple color constants and rgb color codes Say

Clint Plummer 8 Nov 17, 2022
A drop-in replacement for Django's runserver.

About A drop in replacement for Django's built-in runserver command. Features include: An extendable interface for handling things such as real-time l

David Cramer 1.3k Dec 15, 2022
Voltron is an extensible debugger UI toolkit written in Python.

Voltron is an extensible debugger UI toolkit written in Python. It aims to improve the user experience of various debuggers (LLDB, GDB, VDB an

snare 5.9k Dec 30, 2022
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Dec 29, 2022
The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.

LM-Debugger is an open-source interactive tool for inspection and intervention in transformer-based language models. This repository includes the code

Mor Geva 110 Dec 28, 2022
Sampling profiler for Python programs

py-spy: Sampling profiler for Python programs py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spe

Ben Frederickson 9.5k Jan 08, 2023
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Dec 31, 2022
A simple rubber duck debugger

Rubber Duck Debugger I found myself many times asking a question on StackOverflow or to one of my colleagues just for finding the solution simply by d

1 Nov 10, 2021