Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Overview

MAUVE

MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE measure, introduced in this paper (NeurIPS 2021 Oral).

MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.

Documentation Link

Features:

  • MAUVE with quantization using k-means.
  • Adaptive selection of k-means hyperparameters.
  • Compute MAUVE using pre-computed GPT-2 features (i.e., terminal hidden state), or featurize raw text using HuggingFace transformers + PyTorch.

For scripts to reproduce the experiments in the paper, please see this repo.

Installation

For a direct install, run this command from your terminal:

pip install mauve-text

If you wish to edit or contribute to MAUVE, you should install from source

git clone [email protected]:krishnap25/mauve.git
cd mauve
pip install -e .

Some functionality requires more packages. Please see the requirements below.

Requirements

The installation command above installs the main requirements, which are:

  • numpy>=1.18.1
  • scikit-learn>=0.22.1
  • faiss-cpu>=1.7.0
  • tqdm>=4.40.0

In addition, if you wish to use featurization within MAUVE, you need to manually install:

Quick Start

Let p_text and q_text each be a list of strings, where each string is a complete generation (including context). For best practice, MAUVE needs at least a few thousand generations each for p_text and q_text (the paper uses 5000 each). For our demo, we use 100 generations each for fast running time.

To demonstrate the functionalities of this package on some real data, this repository provides some functionalities to download and use sample data in the ./examples folder (these are not a part of the MAUVE package, you need to clone the repository for these).

Let use download some Amazon product reviews as well as machine generations, provided by the GPT-2 output dataset repo by running this command in our shell (downloads ~17M in size):

python examples/download_gpt2_dataset.py

The data is downloaded into the ./data folder. We can load the data (100 samples out of the available 5000) in Python as

from examples import load_gpt2_dataset
p_text = load_gpt2_dataset('data/amazon.valid.jsonl', num_examples=100) # human
q_text = load_gpt2_dataset('data/amazon-xl-1542M.valid.jsonl', num_examples=100) # machine

We can now compute MAUVE as follows (note that this requires installation of PyTorch and HF Transformers).

import mauve 

# call mauve.compute_mauve using raw text on GPU 0; each generation is truncated to 256 tokens
out = mauve.compute_mauve(p_text=p_text, q_text=q_text, device_id=0, max_text_length=256, verbose=False)
print(out.mauve) # prints 0.9917

This first downloads GPT-2 large tokenizer and pre-trained model (if you do not have them downloaded already). Even if you have the model offline, it takes it up to 30 seconds to load the model the first time. out now contains the fields:

  • out.mauve: MAUVE score, a number between 0 and 1. Larger values indicate that P and Q are closer.
  • out.frontier_integral: Frontier Integral, a number between 0 and 1. Smaller values indicate that P and Q are closer.
  • out.divergence_curve: a numpy.ndarray of shape (m, 2); plot it with matplotlib to view the divergence curve
  • out.p_hist: a discrete distribution, which is a quantized version of the text distribution p_text
  • out.q_hist: same as above, but with q_text

You can plot the divergence curve using

# Make sure matplotlib is installed in your environment
import matplotlib.pyplot as plt  
plt.plot(out.divergence_curve[:, 1], out.divergence_curve[:, 0])

Other Ways of Using MAUVE

For each text (in both p_text and q_text), MAUVE internally uses the terimal hidden state from GPT-2 large as a feature representation. This featurization process can be rather slow (~10 mins for 5000 generations at a max length of 1024; but the implementation can be made more efficient, see Contributing). Alternatively, this package allows you to use cached hidden states directly (this does not require PyTorch and HF Transformers to be installed):

# call mauve.compute_mauve using features obtained directly
# p_feats and q_feats are `np.ndarray`s of shape (n, dim)
# we use a synthetic example here
import numpy as np
p_feats = np.random.randn(100, 1024)  # feature dimension = 1024
q_feats = np.random.randn(100, 1024)
out = mauve.compute_mauve(p_features=p_feats, q_features=q_feats)

You can also compute MAUVE using the tokenized (BPE) representation using the GPT-2 vocabulary (e.g., obtained from using an explicit call to transformers.GPT2Tokenizer).

# call mauve.compute_mauve using tokens on GPU 1
# p_toks, q_toks are each a list of LongTensors of shape [1, length]
# we use synthetic examples here
import torch
p_toks = [torch.LongTensor(np.random.choice(50257, size=(1, 32), replace=True)) for _ in range(100)]
q_toks = [torch.LongTensor(np.random.choice(50257, size=(1, 32), replace=True)) for _ in range(100)]
out = mauve.compute_mauve(p_tokens=p_toks, q_tokens=q_toks, device_id=1, max_text_length=1024)

To view the progress messages, pass in the argument verbose=True to mauve.compute_mauve. You can also use different forms as inputs for p and q, e.g., p via p_text and q via q_features.

Available Options

mauve.compute_mauve takes the following arguments

  • p_features: numpy.ndarray of shape (n, d), where n is the number of generations
  • q_features: numpy.ndarray of shape (n, d), where n is the number of generations
  • p_tokens: list of length n, each entry is torch.LongTensor of shape (1, length); length can vary between generations
  • q_tokens: list of length n, each entry is torch.LongTensor of shape (1, length); length can vary between generations
  • p_text: list of length n, each entry is a string
  • q_text: list of length n, each entry is a string
  • num_buckets: the size of the histogram to quantize P and Q. Options: 'auto' (default) or an integer
  • pca_max_data: the number data points to use for PCA dimensionality reduction prior to clustering. If -1, use all the data. Default -1
  • kmeans_explained_var: amount of variance of the data to keep in dimensionality reduction by PCA. Default 0.9
  • kmeans_num_redo: number of times to redo k-means clustering (the best objective is kept). Default 5
  • kmeans_max_iter: maximum number of k-means iterations. Default 500
  • featurize_model_name: name of the model from which features are obtained. Default 'gpt2-large' Use one of ['gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl'].
  • device_id: Device for featurization. Supply a GPU id (e.g. 0 or 3) to use GPU. If no GPU with this id is found, use CPU
  • max_text_length: maximum number of tokens to consider. Default 1024
  • divergence_curve_discretization_size: Number of points to consider on the divergence curve. Default 25
  • mauve_scaling_factor: "c" from the paper. Default 5.
  • verbose: If True (default), print running time updates
  • seed: random seed to initialize k-means cluster assignments.

Note: p and q can be of different lengths, but it is recommended that they are the same length.

Contact

The best way to contact the authors in case of any questions or clarifications (about the package or the paper) is by raising an issue on GitHub. We are not able to respond to queries over email.

Contributing

If you find any bugs, please raise an issue on GitHub. If you would like to contribute, please submit a pull request. We encourage and highly value community contributions.

Some features which would be good to have are:

  • batched implementation featurization (current implementation sequentially featurizes generations); this requires appropriate padding/masking
  • featurization in HuggingFace Transformers with a TensorFlow backend.

Best Practices for MAUVE

MAUVE is quite different from most metrics in common use, so here are a few guidelines on proper usage of MAUVE:

  1. Relative comparisons:
    • We find that MAUVE is best suited for relative comparisons while the absolute MAUVE score is less meaningful.
    • For instance if we wish to find which of model1 and model2 are better at generating the human distribution, we can compare MAUVE(text_model1, text_human) and MAUVE(text_model2, text_human).
    • The absolute number MAUVE(text_model1, text_human) can vary based on the hyperparameters selected below, but the relative trends remain the same.
    • One must ensure that the hyperparameters are exactly the same for the MAUVE scores under comparison.
    • Some hyperparameters are described below.
  2. Number of generations:
    • MAUVE computes the similarity between two distributions.
    • Therefore, each distribution must contain at least a few thousand samples (we use 5000 each). MAUVE with a smaller number of samples is biased towards optimism (that is, MAUVE typically goes down as the number of samples increase) and exhibits a larger standard deviation between runs.
  3. Number of clusters (discretization size):
    • We take num_buckets to be 0.1 * the number of samples.
    • The performance of MAUVE is quite robust to this, provided the number of generations is not too small.
  4. MAUVE is too large or too small:
    • The parameter mauve_scaling_parameter controls the absolute value of the MAUVE score, without changing the relative ordering between various methods. The main purpose of this parameter is to help with interpretability.
    • If you find that all your methods get a very high MAUVE score (e.g., 0.995, 0.994), try increasing the value of mauve_scaling_factor. (note: this also increases the per-run standard deviation of MAUVE).
    • If you find that all your methods get a very low MAUVE score (e.g. < 0.4), then try decreasing the value of mauve_scaling_factor.
  5. MAUVE takes too long to run:
    • You can also try reducing the number of clusters using the argument num_buckets. The clustering algorithm's run time scales as the square of the number of clusters. Once the number of clusters exceeds 500, the clustering really starts to slow down. In this case, it could be helpful to set the number of clusters to 500 by overriding the default (which is num_data_points / 10, so use this when the number of samples for each of p and q is over 5000).
    • In this case, try reducing the clustering hyperparameters: set kmeans_num_redo to 1, and if this does not work, kmeans_max_iter to 100. This enables the clustering to run faster at the cost of returning a worse clustering.

Citation

If you find this package useful, or you use it in your research, please cite:

@inproceedings{pillutla-etal:mauve:neurips2021,
  title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
  author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
  booktitle = {NeurIPS},
  year      = {2021}
}

Further, the Frontier Integral was introduced in this paper:

@inproceedings{liu-etal:divergence:neurips2021,
  title={{Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals}},
  author={Liu, Lang and Pillutla, Krishna and  Welleck, Sean and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
  booktitle = {NeurIPS},
  year      = {2021}
}

Acknowledgements

This work was supported by NSF DMS-2134012, NSF CCF-2019844, NSF DMS-2023166, the DARPA MCS program through NIWC Pacific (N66001-19-2-4031), the CIFAR "Learning in Machines & Brains" program, a Qualcomm Innovation Fellowship, and faculty research awards.

Owner
Krishna Pillutla
Ph.D student in Machine Learning
Krishna Pillutla
Nest - A flexible tool for building and sharing deep learning modules

Nest - A flexible tool for building and sharing deep learning modules Nest is a flexible deep learning module manager, which aims at encouraging code

ZhouYanzhao 41 Oct 10, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 828 Dec 28, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023
天勤量化开发包, 期货量化, 实时行情/历史数据/实盘交易

TqSdk 天勤量化交易策略程序开发包 TqSdk 是一个由信易科技发起并贡献主要代码的开源 python 库. 依托快期多年积累成熟的交易及行情服务器体系, TqSdk 支持用户使用极少的代码量构建各种类型的量化交易策略程序, 并提供包含期货、期权、股票的 历史数据-实时数据-开发调试-策略回测-

信易科技 2.8k Dec 30, 2022
Implicit Graph Neural Networks

Implicit Graph Neural Networks This repository is the official PyTorch implementation of "Implicit Graph Neural Networks". Fangda Gu*, Heng Chang*, We

Heng Chang 48 Nov 29, 2022
Pytorch implementation of our paper accepted by NeurIPS 2021 -- Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) (Link) Overview Prerequisites Linu

Shaojie Li 34 Mar 31, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

83 Jan 01, 2023
Diverse graph algorithms implemented using JGraphT library.

# 1. Installing Maven & Pandas First, please install Java (JDK11) and Python 3 if they are not already. Next, make sure that Maven (for importing J

See Woo Lee 3 Dec 17, 2022
This package contains deep learning models and related scripts for RoseTTAFold

RoseTTAFold This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTT

1.6k Jan 03, 2023
A command line simple note taking app

Why yet another note taking program? note was designed with a very specific target in mind: me, and my 2354 scraps of paper. It runs from the command

64 Nov 20, 2022
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

210 Dec 18, 2022
A multi-mode modulator for multi-domain few-shot classification (ICCV)

A multi-mode modulator for multi-domain few-shot classification (ICCV)

Yanbin Liu 8 Apr 28, 2022
Udacity's CS101: Intro to Computer Science - Building a Search Engine

Udacity's CS101: Intro to Computer Science - Building a Search Engine All soluti

Phillip 0 Feb 26, 2022
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Image Super-Resolution (ISR) The goal of this project is to upscale and improve the quality of low resolution images. This project contains Keras impl

idealo 4k Jan 08, 2023
A list of Machine Learning Art Colabs

ML Visual Art Colabs A list of cool Colabs on Machine Learning Imagemaking or other artistic purposes 3D Ken Burns Effect Ken Burns Effect by Manuel R

Derrick Schultz (he/him) 789 Dec 12, 2022
simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

Ramón Casero 1 Jan 07, 2022
Inference pipeline for our participation in the FeTA challenge 2021.

feta-inference Inference pipeline for our participation in the FeTA challenge 2021. Team name: TRABIT Installation Download the two folders in https:/

Lucas Fidon 2 Apr 13, 2022
Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

ChrisLiu 5 Oct 18, 2022
Breast Cancer Detection 🔬 ITI "AI_Pro" Graduation Project

BreastCancerDetection - This program is designed to predict two severity of abnormalities associated with breast cancer cells: benign and malignant. Mammograms from MIAS is preprocessed and features

6 Nov 29, 2022
Shitty gaze mouse controller

demo.mp4 shitty_gaze_mouse_cotroller install tensofflow, cv2 run the main.py and as it starts it will collect data so first raise your left eyebrow(bo

16 Aug 30, 2022