A diff tool for language models

Overview

LMdiff

Qualitative comparison of large language models.

Demo & Paper: http://lmdiff.net

LMdiff is a MIT-IBM Watson AI Lab collaboration between:
Hendrik Strobelt (IBM, MIT) , Benjamin Hoover (IBM, GeorgiaTech), Arvind Satyanarayan (MIT), and Sebastian Gehrmann (HarvardNLP, Google).

Setting up / Quick start

From the root directory install Conda dependencies:

conda env create -f environment.yml
conda activate LMdiff
pip install -e .

Run the backend in development mode, deploying default models and configurations:

uvicorn backend.server:app --reload

Check the output for the right port (something like http://localhost:8000) and open in Browser.

Rebuild frontend

This is optional, because we have a compiled version checked into this repo.

cd client
npm install
npm run build:backend
cd ..

Using your own models

To use your own models:

  1. Create a TextDataset of phrases to analyze

    You can create the dataset file in several ways:

    From a text file So you have already collected all the phrases you want into a text file separated by newlines. Simply run:
    python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in
    
    From a python object (list of strings) Want to only work within python?
    from analysis.create_dataset import create_text_dataset_from_object
    
    my_collection = ["Phrase 1", "My second phrase"]
    create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")
    From [Huggingface Datasets](https://huggingface.co/docs/datasets/) It can be created from one of Huggingface's provided datasets with:
    from analysis.create_dataset import create_text_dataset_from_hf_datasets
    import datasets
    import path_fixes as pf
    
    glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")
    name = "glue_mrpc_train"
    
    def ds2str(glue):
        """(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""
        sentences = glue['sentence1'][:50]
        return "\n".join(sentences)
    
    create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)

    The dataset is a simple .txt file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,

    ---
    checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28
    name: dataset-dummy
    type: human_created
    ---
    
    This is sentence 1, please analyze this.
    Every line is a new phrase to pass to the model.
    I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.
    

    The required fields in the header:

    • checksum :: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)
    • name :: The name of the dataset.
    • type :: Either human_created or machine_generated if you want to compare on a dataset that was spit out by another model

    Each line in the contents is a new phrase to compare in the language model. A few warnings:

    • Make sure the phrases are short enough that they can be passed to the model given your memory constraints
    • The dataset is fully loaded into memory to serve to the front end, so avoid creating a text file that is too large to fit in memory.
  2. Choose two comparable models

    Two models are comparable if they:

    1. Have the exact same tokenization scheme
    2. Have the exact same vocabulary

    This allows us to do tokenwise comparisons on the model. For example, this could be:

    • A pretrained model and a finetuned version of it (e.g., distilbert-base-cased and distilbert-base-uncased-finetuned-sst-2-english)
    • A distilled version mimicking the original model (e.g., bert-base-cased and distilbert-base-cased)
    • Different sizes of the same model architecture (e.g., gpt2 and gpt2-large)
  3. Preprocess the models on the chosen dataset

    python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons
    
  4. Start the app

    python backend/server/main.py --config data/sample/gpt2-glue-comparisons
    

    Note that if you use a different tokenization scheme than the default gpt, you will need to tell the frontend how to visualize the tokens. For example, a bert based tokenization scheme:

    python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert
    

Architecture

LMdiff Architecture

(Admin) Getting the Data

Models and datasets for the deployed app are stored on the cloud and require a private .dvc/config file.

With the correct config:

dvc pull

will populate the data directories correctly for the deployed version.

Testing
make test

or

python -m pytest tests

All tests are stored in tests.

Frontend

We like pnpm but npm works just as well. We also like Vite for its rapid hot module reloading and pleasant dev experience. This repository uses Vue as a reactive framework.

From the root directory:

cd client
pnpm install --save-dev
pnpm run dev

If you want to hit the backend routes, make sure to also run the uvicorn backend.server:app command from the project root.

For production (serve with Vite)
pnpm run serve
For production (serve with this repo's FastAPI server)
cd client
pnpm run build:backend
cd ..
uvicorn backend.server:app

Or the gunicorn command from above.

All artifacts are stored in the client/dist directory with the appropriate basepath.

For production (serve with external tooling like NGINX)
pnpm run build

All artifacts are stored in the client/dist directory.

Notes

  • Check the endpoints by visiting <localhost>:<port>/docs
Owner
Hendrik Strobelt
IBM Research // MIT-IBM AI Lab Updates on Twitter: @hen_str
Hendrik Strobelt
Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

Son Gyo Jung 1 Jul 09, 2022
Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

Off-Policy Correction For Multi-Agent Reinforcement Learning This repository is the official implementation of Off-Policy Correction For Multi-Agent R

4 Aug 18, 2022
[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks Yonggan Fu, Qixuan Yu, Yang Zhang, S

12 Dec 11, 2022
PyTorch implementation of residual gated graph ConvNets, ICLR’18

Residual Gated Graph ConvNets April 24, 2018 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbress

Xavier Bresson 112 Aug 10, 2022
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation By Qing Xu, Wenting Duan and Na He Requirements pytorch==1.1

Qing Xu 20 Dec 09, 2022
Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces

This repository contains source code for the paper Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces a

9 Nov 21, 2022
Must-read Papers on Physics-Informed Neural Networks.

PINNpapers Contributed by IDRL lab. Introduction Physics-Informed Neural Network (PINN) has achieved great success in scientific computing since 2017.

IDRL 330 Jan 07, 2023
This is the repository of the NeurIPS 2021 paper "Curriculum Disentangled Recommendation withNoisy Multi-feedback"

Curriculum_disentangled_recommendation This is the repository of the NeurIPS 2021 paper "Curriculum Disentangled Recommendation with Noisy Multi-feedb

14 Dec 20, 2022
MIRACLE (Missing data Imputation Refinement And Causal LEarning)

MIRACLE (Missing data Imputation Refinement And Causal LEarning) Code Author: Trent Kyono This repository contains the code used for the "MIRACLE: Cau

van_der_Schaar \LAB 15 Dec 29, 2022
Cluttered MNIST Dataset

Cluttered MNIST Dataset A setup script will download MNIST and produce mnist/*.t7 files: luajit download_mnist.lua Example usage: local mnist_clutter

DeepMind 50 Jul 12, 2022
Geometric Sensitivity Decomposition

Geometric Sensitivity Decomposition This repo is the official implementation of A Geometric Perspective towards Neural Calibration via Sensitivity Dec

16 Dec 26, 2022
The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines. It includes tools for downloading pipelines and their dependencies and tools for measuring their performace

8 Dec 04, 2022
Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Optimizers Visualized Visualization of how different optimizers handle mathematical functions for optimization. Contents Installation Usage Functions

Gautam J 1 Jan 01, 2022
PyTorch implementation of MICCAI 2018 paper "Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector"

Grouped SSD (GSSD) for liver lesion detection from multi-phase CT Note: the MICCAI 2018 paper only covers the multi-phase lesion detection part of thi

Sang-gil Lee 36 Oct 12, 2022
Learning Optical Flow from a Few Matches (CVPR 2021)

Learning Optical Flow from a Few Matches This repository contains the source code for our paper: Learning Optical Flow from a Few Matches CVPR 2021 Sh

Shihao Jiang (Zac) 159 Dec 16, 2022
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

Chenhongyi Yang 276 Dec 31, 2022
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

SangHun 61 Dec 27, 2022