A diff tool for language models

Overview

LMdiff

Qualitative comparison of large language models.

Demo & Paper: http://lmdiff.net

LMdiff is a MIT-IBM Watson AI Lab collaboration between:
Hendrik Strobelt (IBM, MIT) , Benjamin Hoover (IBM, GeorgiaTech), Arvind Satyanarayan (MIT), and Sebastian Gehrmann (HarvardNLP, Google).

Setting up / Quick start

From the root directory install Conda dependencies:

conda env create -f environment.yml
conda activate LMdiff
pip install -e .

Run the backend in development mode, deploying default models and configurations:

uvicorn backend.server:app --reload

Check the output for the right port (something like http://localhost:8000) and open in Browser.

Rebuild frontend

This is optional, because we have a compiled version checked into this repo.

cd client
npm install
npm run build:backend
cd ..

Using your own models

To use your own models:

  1. Create a TextDataset of phrases to analyze

    You can create the dataset file in several ways:

    From a text file So you have already collected all the phrases you want into a text file separated by newlines. Simply run:
    python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in
    
    From a python object (list of strings) Want to only work within python?
    from analysis.create_dataset import create_text_dataset_from_object
    
    my_collection = ["Phrase 1", "My second phrase"]
    create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")
    From [Huggingface Datasets](https://huggingface.co/docs/datasets/) It can be created from one of Huggingface's provided datasets with:
    from analysis.create_dataset import create_text_dataset_from_hf_datasets
    import datasets
    import path_fixes as pf
    
    glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")
    name = "glue_mrpc_train"
    
    def ds2str(glue):
        """(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""
        sentences = glue['sentence1'][:50]
        return "\n".join(sentences)
    
    create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)

    The dataset is a simple .txt file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,

    ---
    checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28
    name: dataset-dummy
    type: human_created
    ---
    
    This is sentence 1, please analyze this.
    Every line is a new phrase to pass to the model.
    I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.
    

    The required fields in the header:

    • checksum :: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)
    • name :: The name of the dataset.
    • type :: Either human_created or machine_generated if you want to compare on a dataset that was spit out by another model

    Each line in the contents is a new phrase to compare in the language model. A few warnings:

    • Make sure the phrases are short enough that they can be passed to the model given your memory constraints
    • The dataset is fully loaded into memory to serve to the front end, so avoid creating a text file that is too large to fit in memory.
  2. Choose two comparable models

    Two models are comparable if they:

    1. Have the exact same tokenization scheme
    2. Have the exact same vocabulary

    This allows us to do tokenwise comparisons on the model. For example, this could be:

    • A pretrained model and a finetuned version of it (e.g., distilbert-base-cased and distilbert-base-uncased-finetuned-sst-2-english)
    • A distilled version mimicking the original model (e.g., bert-base-cased and distilbert-base-cased)
    • Different sizes of the same model architecture (e.g., gpt2 and gpt2-large)
  3. Preprocess the models on the chosen dataset

    python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons
    
  4. Start the app

    python backend/server/main.py --config data/sample/gpt2-glue-comparisons
    

    Note that if you use a different tokenization scheme than the default gpt, you will need to tell the frontend how to visualize the tokens. For example, a bert based tokenization scheme:

    python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert
    

Architecture

LMdiff Architecture

(Admin) Getting the Data

Models and datasets for the deployed app are stored on the cloud and require a private .dvc/config file.

With the correct config:

dvc pull

will populate the data directories correctly for the deployed version.

Testing
make test

or

python -m pytest tests

All tests are stored in tests.

Frontend

We like pnpm but npm works just as well. We also like Vite for its rapid hot module reloading and pleasant dev experience. This repository uses Vue as a reactive framework.

From the root directory:

cd client
pnpm install --save-dev
pnpm run dev

If you want to hit the backend routes, make sure to also run the uvicorn backend.server:app command from the project root.

For production (serve with Vite)
pnpm run serve
For production (serve with this repo's FastAPI server)
cd client
pnpm run build:backend
cd ..
uvicorn backend.server:app

Or the gunicorn command from above.

All artifacts are stored in the client/dist directory with the appropriate basepath.

For production (serve with external tooling like NGINX)
pnpm run build

All artifacts are stored in the client/dist directory.

Notes

  • Check the endpoints by visiting <localhost>:<port>/docs
Owner
Hendrik Strobelt
IBM Research // MIT-IBM AI Lab Updates on Twitter: @hen_str
Hendrik Strobelt
Code-free deep segmentation for computational pathology

NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation

André Pedersen 26 Nov 23, 2022
A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints A Python package for generating concise, high-quality summaries of a probability distribution GoodPoints is a collection of tools for compr

Microsoft 28 Oct 10, 2022
rliable is an open-source Python library for reliable evaluation, even with a handful of runs, on reinforcement learning and machine learnings benchmarks.

Open-source library for reliable evaluation on reinforcement learning and machine learning benchmarks. See NeurIPS 2021 oral for details.

Google Research 529 Jan 01, 2023
Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

TwitterSuicideML Scripts for reproducing the Machine Learning analysis of the paper: Detecting Potentially Harmful and Protective Suicide-related Cont

3 Oct 17, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022
Command-line tool for downloading and extending the RedCaps dataset.

RedCaps Downloader This repository provides the official command-line tool for downloading and extending the RedCaps dataset. Users can seamlessly dow

RedCaps dataset 33 Dec 14, 2022
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Katsuya Hyodo 6 May 15, 2022
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021) An efficient PyTorch library for Point Cloud Completion.

Microsoft 119 Jan 02, 2023
Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

The official code for the paper "Inverse Problems Leveraging Pre-trained Contrastive Representations" (to appear in NeurIPS 2021).

Sriram Ravula 26 Dec 10, 2022
A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

Geoff Pleiss 1.4k Dec 25, 2022
A Python package for faster, safer, and simpler ML processes

Bender đŸ¤– A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022
A collection of implementations of deep domain adaptation algorithms

Deep Transfer Learning on PyTorch This is a PyTorch library for deep transfer learning. We divide the code into two aspects: Single-source Unsupervise

Yongchun Zhu 647 Jan 03, 2023
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
CARL provides highly configurable contextual extensions to several well-known RL environments.

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments.

AutoML-Freiburg-Hannover 51 Dec 28, 2022
N-RPG - Novel role playing game da turfu

N-RPG Ce README sera la page de garde du projet. Contenu Il contiendra la présen

4 Mar 15, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

Wang jiahao 3 Oct 31, 2022