The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.

Overview

LM-Debugger is an open-source interactive tool for inspection and intervention in transformer-based language models. This repository includes the code and links for data files required for running LM-Debugger over GPT2 Large and GPT2 Medium. Adapting this tool to other models only requires changing the backend API (see details below). Contributions our welcome!

An online demo of LM-Debugger is available at:

For more details, please check our paper: "LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models".

⚙️ Requirements

LM-Debugger has two main views for (a) debugging and intervention in model predictions, and (b) exploration of information encoded in the model's feed-forward layers.

The tool runs in a React and python environment with Flask and Streamlit installed. In addition, the exploration view uses an Elasticsearch index. To set up the environment, please follow the steps below:

  1. Clone this repository:

    git clone https://github.com/mega002/lm-debugger
    cd lm-debugger
  2. Create a Python 3.8 environment, and install the following dependencies:

    pip install -r requirements.txt
  3. Install Yarn and NVM, and set up the React environment:

    cd ui
    nvm install
    yarn install
    cd ..
  4. Install Elasticsearch and make sure that the service is up.

🔎 Running LM-Debugger

Creating a Configuration File

LM-Debugger executes one model at a time, based on a given configuration file. The configuration includes IP addresses and port numbers for running the different services, as well as the following fields:

  • model_name: The current version of LM-Debugger supports GPT2 models from HuggingFace (e.g. gpt2-medium or gpt2-large).
  • server_files_dir: A path to store files with preprocessed model information, created by the script create_offline_files.py. The script creates 3 pickle files with (1) projections to the vocabulary of parameter vectors of the model's feed-forward layers, (2) two separate files with mappings between parameter vectors and clusters (and vice versa).
  • create_cluster_files: A boolean field (true/false) that indicates whether to run clustering or not. This is optional since clustering of the feed-forward parameter vectors can take several hours and might require extra computation resources (especially for large models).

Sample configuration files for the medium and large versions of GPT2 are provided in the config_files directory. The preprocessed data files for these models are available for download here.

Creating an Elasticsearch Index

The keyword search functionality in the exploration view is powered by an Elasticsearch index that stores the projections of feed-forward parameter vectors from the entire network. To create this index, run:

python es_index/index_value_projections_docs.py \
--config_path CONFIG_PATH

Executing LM-Debugger

To run LM-Debugger:

bash start.sh CONFIG_PATH

In case you are interested in running only one of the two views of LM-Debugger, this can be done as follows:

  1. To run the Flask server (needed for the prediction view):

    python flask_server/app.py --config_path CONFIG_PATH
  2. To run the prediction view:

    python ui/src/convert2runConfig.py --config_path CONFIG_PATH
    cd ui
    yarn start
  3. To run the exploration view:

    streamlit run streamlit/exploration.py -- --config_path CONFIG_PATH

Citation

Please cite as:

@article{geva2022lmdebugger,
  title={LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models},
  author={Geva, Mor and Caciularu, Avi and Dar, Guy and Roit, Paul and Sadde, Shoval and Shlain, Micah and Tamir, Bar and Goldberg, Yoav},
  journal={arXiv preprint arXiv:2204.12130},
  year={2022}
}
Owner
Mor Geva
Mor Geva
Visual Interaction with Code - A portable visual debugger for python

VIC Visual Interaction with Code A simple tool for debugging and interacting with running python code. This tool is designed to make it easy to inspec

Nathan Blank 1 Nov 16, 2021
Inject code into running Python processes

pyrasite Tools for injecting arbitrary code into running Python processes. homepage: http://pyrasite.com documentation: http://pyrasite.rtfd.org downl

Luke Macken 2.7k Jan 08, 2023
🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.

Pyflame: A Ptracing Profiler For Python (This project is deprecated and not maintained.) Pyflame is a high performance profiling tool that generates f

Uber Archive 3k Jan 07, 2023
Never use print for debugging again

PySnooper - Never use print for debugging again PySnooper is a poor man's debugger. If you've used Bash, it's like set -x for Python, except it's fanc

Ram Rachum 15.5k Jan 01, 2023
A drop-in replacement for Django's runserver.

About A drop in replacement for Django's built-in runserver command. Features include: An extendable interface for handling things such as real-time l

David Cramer 1.3k Dec 15, 2022
Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.

Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.

Scott Rogowski 3k Jan 01, 2023
Trace any Python program, anywhere!

lptrace lptrace is strace for Python programs. It lets you see in real-time what functions a Python program is running. It's particularly useful to de

Karim Hamidou 687 Nov 20, 2022
Python's missing debug print command and other development tools.

python devtools Python's missing debug print command and other development tools. For more information, see documentation. Install Just pip install de

Samuel Colvin 637 Jan 02, 2023
NoPdb: Non-interactive Python Debugger

NoPdb: Non-interactive Python Debugger Installation: pip install nopdb Docs: https://nopdb.readthedocs.io/ NoPdb is a programmatic (non-interactive) d

Ondřej Cífka 67 Oct 15, 2022
Visual profiler for Python

vprof vprof is a Python package providing rich and interactive visualizations for various Python program characteristics such as running time and memo

Nick Volynets 3.9k Jan 01, 2023
Winpdb Reborn - A GPL Python Debugger, reborn from the unmaintained Winpdb

Note from Philippe Fremy The port of winpdb-reborn to Python 3 / WxPython 4 is unfortunately not working very well. So Winpdb for Python 3 does not re

Philippe F 84 Dec 22, 2022
An improbable web debugger through WebSockets

wdb - Web Debugger Description wdb is a full featured web debugger based on a client-server architecture. The wdb server which is responsible of manag

Kozea 1.6k Dec 09, 2022
VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.

VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.

2.8k Jan 08, 2023
PINCE is a front-end/reverse engineering tool for the GNU Project Debugger (GDB), focused on games.

PINCE is a front-end/reverse engineering tool for the GNU Project Debugger (GDB), focused on games. However, it can be used for any reverse-engi

Korcan Karaokçu 1.5k Jan 01, 2023
Debugging manhole for python applications.

Overview docs tests package Manhole is in-process service that will accept unix domain socket connections and present the stacktraces for all threads

Ionel Cristian Mărieș 332 Dec 07, 2022
A package containing a lot of useful utilities for Python developing and debugging.

Vpack A package containing a lot of useful utilities for Python developing and debugging. Features Sigview: press Ctrl+C to print the current stack in

volltin 16 Aug 18, 2022
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Dec 31, 2022
Full-screen console debugger for Python

PuDB: a console-based visual debugger for Python Its goal is to provide all the niceties of modern GUI-based debuggers in a more lightweight and keybo

Andreas Klöckner 2.6k Jan 01, 2023
printstack is a Python package that adds stack trace links to the builtin print function, so that editors such as PyCharm can link you to the source of the print call.

printstack is a Python package that adds stack trace links to the builtin print function, so that editors such as PyCharm can link to the source of the print call.

101 Aug 26, 2022
Sentry is cross-platform application monitoring, with a focus on error reporting.

Users and logs provide clues. Sentry provides answers. What's Sentry? Sentry is a service that helps you monitor and fix crashes in realtime. The serv

Sentry 32.9k Dec 31, 2022