Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Last update: Dec 19, 2022

Related tags

Deep Learning incontext-learning

Overview

GINC small-scale in-context learning dataset

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. We provide code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

Quickstart

Please create a conda environment or virtualenv using the information in conda-env.yml, then install transformers by going into the transformers/ directory and running pip install -e .. Modify consts.sh to change the default output locations and insert code to activate the environment of choice. Run scripts/runner.sh to run all the experiments on sbatch.

Explore the data

The default dataset has vocab size 50 and the pretraining data is generated as a mixture of 5 HMMs. The pretraining dataset is in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/train.json while in-context prompts are in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/id_prompts_randomsample_*.json.

This repo contains the experiments for the paper An Explanation of In-context Learning as Implicit Bayesian Inference. If you found this repo useful, please cite

@article{xie2021incontext,
  author = {Sang Michael Xie and Aditi Raghunathan and Percy Liang and Tengyu Ma},
  journal = {arXiv preprint arXiv:2111.02080},
  title = {An Explanation of In-context Learning as Implicit Bayesian Inference},
  year = {2021},
}

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Related tags

Overview

GINC small-scale in-context learning dataset

Quickstart

Explore the data

Owner

P-Lambda

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

Facial expression detector

SphereFace: Deep Hypersphere Embedding for Face Recognition

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Official PyTorch implementation of GDWCT (CVPR 2019, oral)

COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

A different spin on dataclasses.

VLGrammar: Grounded Grammar Induction of Vision and Language

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

A smart Chat bot that can help to know about corona virus and Make prediction of corona using X-ray.

A Deep Reinforcement Learning Framework for Stock Market Trading

Linear image-to-image translation

CRNN With PyTorch

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation