Training data extraction on GPT-2

Overview

Training data extraction from GPT-2

This repository contains code for extracting training data from GPT-2, following the approach outlined in the following paper:

Extracting Training Data from Large Language Models
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel
USENIX Security Symposium, 2021
https://arxiv.org/abs/2012.07805

WARNING: The experiments in our paper relied on different non-public codebases, and also involved a large amount of manual labor. The code in this repository is thus not meant to exactly reproduce the paper's results, but instead to illustrate the paper's approach and to help others perform similar experiments.
The code in this repository has not been tested at the scale considered in the paper (600,000 generated samples) and might find memorized content at a lower (or higher) rate!

Installation

You will need transformers, pytorch and tqdm. The code was tested with transformers v3.0.2 and torch v1.5.1.

Extracting Data

Simply run

python3 extraction.py --N 1000 --batch-size 10

to generate 1000 samples with GPT-2 (XL). The samples are generated with top-k sampling (k=40) and an empty prompt.

The generated samples are ranked according to four membership inference metrics introduced in our paper:

  • The log perplexity of the GPT-2 (XL) model.
  • The ratio of the log perplexities of the GPT-2 (XL) model and the GPT-2 (S) model.
  • The ratio of the log perplexities for the generated sample and the same sample in lower-case letters.
  • The ratio of the log perplexity of GPT-2 (XL) and the sample's entropy estimated by Zlib.

The top 10 samples according to each metric are printed out. These samples are likely to contain verbatim text from the GPT-2 training data.

Conditioning on Internet text

In our paper, we found that prompting GPT-2 with small snippets of text taken from the Web increased the chance of the model generating memorized content.

To reproduce this attack, first download a slice of the Common Crawl dataset:

./download_cc.sh

This will download a sample of the Crawl from May 2021 (~350 MB) to a file called commoncrawl.warc.wet.

Then, we can run the extraction attack with Internet prompts:

python3 extraction.py --N 1000 --internet-sampling --wet-file commoncrawl.warc.wet

Sample outputs

Some interesting data that we extracted from GPT-2 can be found here.

Note that these were found among 600,000 generated samples. If you generate a much smaller number of samples (10,000 for example), you will be less likely to find memorized content.

Citation

If this code is useful in your research, you are encouraged to cite our academic paper:

@inproceedings{carlini21extracting,
  author = {Carlini, Nicholas and Tramer, Florian and Wallace, Eric and Jagielski, Matthew and Herbert-Voss, Ariel and Lee, Katherine and Roberts, Adam and Brown, Tom and Song, Dawn and Erlingsson, Ulfar and Oprea, Alina and Raffel, Colin},
  title = {Extracting Training Data from Large Language Models},
  booktitle = {USENIX Security Symposium},
  year = {2021},
  howpublished = {arXiv preprint arXiv:2012.07805},
  url = {https://arxiv.org/abs/2012.07805}
}
Owner
Florian Tramer
Florian Tramer
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ User support: lambeq-su

Cambridge Quantum 315 Jan 01, 2023
Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

twinkle 16 Nov 14, 2022
Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

Pgnet There's a improved version compared with the publication in Tgrs with the modification in the deduction of the PDIN block: https://arxiv.org/abs

5 Jul 01, 2022
Neighborhood Contrastive Learning for Novel Class Discovery

Neighborhood Contrastive Learning for Novel Class Discovery This repository contains the official implementation of our paper: Neighborhood Contrastiv

Zhun Zhong 56 Dec 09, 2022
The second project in Python course on FCC

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Denise T 1 Dec 13, 2021
🇰🇷 Text to Image in Korean

KoDALLE Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder. Background Training DALLE mo

HappyFace 74 Sep 22, 2022
Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Point-Unet This is an unofficial implementation of the MICCAI 2021 paper Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segment

Namt0d 9 Dec 07, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

20 Jul 29, 2022
A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

CFN-SR A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION The audio-video based multimodal

skeleton 15 Sep 26, 2022
ReAct: Out-of-distribution Detection With Rectified Activations

ReAct: Out-of-distribution Detection With Rectified Activations This is the source code for paper ReAct: Out-of-distribution Detection With Rectified

38 Dec 05, 2022
Rate-limit-semaphore - Semaphore implementation with rate limit restriction for async-style (any core)

Rate Limit Semaphore Rate limit semaphore for async-style (any core) There are t

Yan Kurbatov 4 Jun 21, 2022
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Osborne Mine, Australia - Airborne total-field magnetic anomaly This is a section of a survey acquired in 1990 by the Queensland Government, Australia

Fatiando a Terra Datasets 1 Jan 21, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Benchmark Model Electrical and ICT System This repository contains the documentation, code, and models for the electrical and ICT benchmark model deve

ERIGrid 2.0 1 Nov 29, 2021
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 335 Jan 06, 2023
A reimplementation of DCGAN in PyTorch

DCGAN in PyTorch A reimplementation of DCGAN in PyTorch. Although there is an abundant source of code and examples found online (as well as an officia

Diego Porres 6 Jan 08, 2022
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022
Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

Tensorflow 2 Object Detection API Bu tutorial, TensorFlow 2.x'in kararlı sürümü olan TensorFlow 2.3'ye yöneliktir. Bu, görüntülerde / videoda nesne a

46 Nov 20, 2022