Few-shot Learning of GPT-3

Overview

Few-shot Learning With Language Models

This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper. In particular, a few training examples are placed into a natural language "prompt" and predictions are made by generating from the language model. See the GPT-3 paper and Calibrate Before Use for more information.

You can run this codebase with GPT-3 (if you have a key from OpenAI), GPT-2, and any other language model available in HuggingFace Transformers. If you have a GPT-3 key, you should place your API key into a file named openai_key.txt. The underlying model you use is abstracted away using a common API.

Running this codebase will report results with and without contextual calibration.

Dependencies

This code is written using PyTorch and HuggingFace's Transformer repo. If you are running a model locally (e.g., GPT-2), the code requires a single GPU. Running these experiments is relatively lightweight (there is no training), so a single GPU is sufficient. It is technically possible to run the experiments without a GPU, but the runtime will be slow.

Installation

The easiest way to install the code is to create a fresh anaconda environment:

conda create -n fewshot python=3.6
source activate fewshot
pip install -r requirements.txt

Now you should be ready to go!

Replicating Our Results

Here is how to replicate the results from our paper for GPT-2. To replicate the results for classification tasks:

CUDA_VISIBLE_DEVICES=0 python run_classification.py \
--model="gpt2-xl" \
--dataset="sst2, trec, cb, agnews, dbpedia" \
--num_seeds=5 \
--all_shots="0, 1, 4, 8" \
--subsample_test_set=300 \
--approx

To replicate the results for extraction tasks:

CUDA_VISIBLE_DEVICES=0 python run_extraction.py \
--model="gpt2-xl" \
--dataset="mit_movie_Genre, mit_movie_Director, atis_airline_name, atis_depart_date.day_name" \
--num_seeds=5 \
--all_shots="0, 1, 4, 8" \
--subsample_test_set=300

To replicate the results for LAMA:

CUDA_VISIBLE_DEVICES=0 python run_lama.py

Note that after we refactored our code, the training sets are not the same ones used in our results table. We expect the results to differ slightly but they should match the same trends seen in our results.

Overview of Codebase

Data

The data folder contains the raw data for numerous tasks. If you'd like to add your own task, add the data into that folder. The code for loading a dataset, as well as defining the prompt format for a task, is in utils/data_utils.py. We have loaders for a wide range of existing datasets. If you want to add a new dataset that is similar in structure to any of the existing datasets (e.g., its text classification) adding it should be very simple---you can use an existing dataset as a guide.

Utils

The utils folder contains all of the code for calling the underlying models, getting the probabilities of each label token, possibly applying contextual calibration, and more. If you just want to evaluate few-shot learning on your task, you should not need to modify this code. If you want to extend our code (e.g., modify how decisions are made) this is the place to look.

Run Scripts

The run scripts, e.g., run_classification.py, contain the code for randomly sampling the examples to use in the prompt, calling the models, the necessary evaluation metrics, and more. If you are adding a new task format (one that is not classification, QA) then you will need to write your own run script. Inside the run script, you can set the parameters for the experiments using the command line arguments.

For all experiments, we save and pickle the outputs of the model. This makes doing a post-hoc analysis of the accuracy / plotting results / etc. very fast. You can also use the saved outputs to evaluate how the accuracy would have changed if a different decision making function was used (e.g., accuracy with and without contextual calibration).

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@article{Zhao2021Calibrate,	
  Author = {Tony Z. Zhao and Eric Wallace and Shi Feng and Dan Klein and Sameer Singh},	
  Journal={arXiv preprint arXiv:2102.09690},	
  Year = {2021},	
  Title = {Calibrate Before Use: Improving Few-shot Performance of Language Models}	
}    	

Contributions and Contact

This code was developed by Tony Z. Zhao and Eric Wallace, contact available at [email protected] and [email protected].

If you'd like to contribute code, feel free to open a pull request. If you find an issue, please open an issue.

Owner
Tony Z. Zhao
UC Berkeley EECS, working on robotics, NLP and ML
Tony Z. Zhao
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022
Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

snscrape-jsonl-urls-extractor extracts urls from jsonl produced by snscrape Usag

1 Feb 26, 2022
Playing around with FastAPI and streamlit to create a YoloV5 object detector

FastAPI-Streamlit-based-YoloV5-detector Playing around with FastAPI and streamlit to create a YoloV5 object detector It turns out that a User Interfac

2 Jan 20, 2022
FLSim a flexible, standalone library written in PyTorch that simulates FL settings with a minimal, easy-to-use API

Federated Learning Simulator (FLSim) is a flexible, standalone core library that simulates FL settings with a minimal, easy-to-use API. FLSim is domain-agnostic and accommodates many use cases such a

Meta Research 162 Jan 02, 2023
lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

Joseph Reisinger 163 Nov 24, 2022
a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

Nidhal Baccouri 3k Jan 05, 2023
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

International Business Machines 43 Dec 26, 2022
Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

Vis2Mesh This is the offical repository of the paper: Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Lear

71 Dec 25, 2022
The pure and clear PyTorch Distributed Training Framework.

The pure and clear PyTorch Distributed Training Framework. Introduction Requirements and Usage Dependency Dataset Basic Usage Slurm Cluster Usage Base

WILL LEE 208 Dec 20, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

281 Dec 09, 2022
Classify music genre from a 10 second sound stream using a Neural Network.

MusicGenreClassification Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. Featured in

Matan Lachmish 453 Dec 27, 2022
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
A simple python program that can be used to implement user authentication tokens into your program...

token-generator A simple python module that can be used by developers to implement user authentication tokens into your program... code examples creat

octo 6 Apr 18, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 09, 2022
Image-Scaling Attacks and Defenses

Image-Scaling Attacks & Defenses This repository belongs to our publication: Erwin Quiring, David Klein, Daniel Arp, Martin Johns and Konrad Rieck. Ad

Erwin Quiring 163 Nov 21, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
you can add any codes in any language by creating its respective folder (if already not available).

HACKTOBERFEST-2021-WEB-DEV Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k21 ? come on in About This is repository of Responsive Portfo

Suman Sharma 8 Oct 17, 2022
KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

KDD CUP 2020: AutoGraph Team: aister Members: Jianqiang Huang, Xingyuan Tang, Mingjian Chen, Jin Xu, Bohang Zheng, Yi Qi, Ke Hu, Jun Lei Team Introduc

96 May 30, 2022
Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we

Koutilya PNVR 23 Oct 19, 2022