Text Generation by Learning from Demonstrations

Overview

Text Generation by Learning from Demonstrations

The README was last updated on March 7, 2021. The repo is based on fairseq (v0.9.?).

Paper

arXiv

Prerequisites

Per fairseq usage, we need to install this particular modifed version fairseq. The simplest way: pip install --editable ./.

Due to pytorch changes, and given that we're using a slightly older version of fairseq (see below), please use pytorch version <= 1.6.0. However, the GOLD algorithm can be easily implemented on top of the latest fairseq (or most text generation codebases).

Datasets

For downloading CNN/DM and XSum datasets, we follow the instructions here; note that this link does not correspond to the latest fairseq. Our version of the CNN/DM input articles include the prepended "(CNN)" tags. For downloading IWSLT14 De-En dataset, we follow the instructions here. The binary files are provided in our repo, in the directory data-bin. For downloading the particular version of our NQG dataset, we follow the instructions here. The binary files are provided upon request.

Code: experiments on transformer models using fairseq

For reproducibility, the code is based on a April 2020 version of fairseq (based on release v0.9.0). However, it is easy to reimplement the GOLD algorithm in the latest version of fairseq and in another frameworks.

How to implement in the latest version of fairseq?

  • If your GPUs "have large memory", then most of the implementation happens around the criterion code (for question generation, summarization, translation, the py file is ./fairseq/criterions/label_smoothed_cross_entropy.py in the April 2020 version of fairseq). Note that the implementation in this repo uses this approach.
    • "Have large memory": Meaning the GPUs can store pi, pi-tilde, p_MLE at the same time; see Algorithm 1 in the paper. In our experiments (using the same datasets, same batch size, etc.), this would imply that the GPUs have ~24G of memory.
  • If your GPUs cannot fit the above models, then you may need to input p_MLE probabilities as features. This can be done by first saving the probabilities into a text file or pickle file, and then loading them in the load_langpair_dataset function of ./fairseq/tasks/translation.py (or other corresponding files for other tasks).

How to implement in other codebase?

  • See Algorithm 1 in the paper. The majority of the work will happen around the loss computation. We need to have three different models ready when computing losses: (1) pi, the network we're training; (2) pi-tilde, a slightly older version of pi (created to ensure training stability, similar to the periodic synchronization in deep Q-learning; (3) p_MLE, to compute rewards (but this can be pre-loaded in the form of input features, in case the GPU cannot fit the third model).

BART summarization generation fairseq issue

Given that there has been minor bugs with the fairseq BART summarization code (details on original fairseq github), we make the corresponding changes according to the fairseq authors' recommendation. (1) In ./fairseq/sequence_generator.py, see the modification here. (2) In ./fairseq/tasks/fairseq_task.py, see the modification here. (3) In ./fairseq/models/bart/hub_interface.py, see the modification here. The above is already implemented in this repo. But if we're reimplementing the GOLD code in the latest fairseq, we need to beware of this issue (and keep the three modifications in mind).

How to run?

Training

The entry point for training is ./fairseq_cli/train.py. See ./fairseq/options.py for possible flags. For CNN/DM, the script for running GOLD-p is provided in run_cnndm_goldp.sh; the script for running GOLD-s (which often performs better than GOLD-p) is provided in run_cnndm_golds.sh. Some other scripts for other tasks are also provided. For explanations of flags, please refer to ./fairseq/options.py as well as Algorithm 1 in the paper.

Validation

Note that to validate, one possibility is to find the checkpoint that corresponds to highest BLEU/ROUGE-2 score on dev set. We cannot validate according to NLL loss, given that in the paper, we showed that our models achieve higher accuracy but higher perplexity (and NLL loss). Do not use checkpoint_best.pt. IWSLT14 De-En validation is implemented. For summarization, please use run_cnndm_validation.py (similar to run_cnndm_inference.py) as an example to loop through all checkpoints. Then, compute the ROUGE based on run_cnndm_validation_step2.sh (perhaps with small modifications).

Evaluation/inference

For BART evaluation, we use the inference scripts provided in run_cnndm_inference.sh, run_xsum_inference.sh, run_squad_inference.sh. For IWSLT14 De-En inference, the following few lines will do.

python -W ignore [path-to-fairseq_cli/generate.py] data-bin/iwslt14.tokenized.de-en \
    --path [path-to-model-checkpoint.pt] \
    --batch-size 128 --beam 5 --remove-bpe --gen-subset test  > [path-to-save-to-file]

Transformer models

Please ensure the data is processed appropriately before using the models.

MLE model checkpoints

GOLD-s model checkpoints

Not a lot of hyperparameter search was done for the transformer models, so it is likely that more search (on hyperparameters, on architecture) could reach better performance.

Moreover, for summarization models, we use pyrouge+files2rouge to evaluate, based on the fairseq instructions after pyrouge+files2rouge installation. The package files2rouge has a common WordNet-2.0.exc.db error; see this link for the fix.

Citation, authors, and contact

The bibtex entry

Richard Yuanzhe Pang

He He

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

ASAPP Research 49 Oct 09, 2022
Generates all variables from your .tf files into a variables.tf file.

tfvg Generates all variables from your .tf files into a variables.tf file. It searches for every var.variable_name in your .tf files and generates a v

1 Dec 01, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 03, 2023
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.3k Dec 30, 2022
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

ENet in Caffe Execution times and hardware requirements Network 1024x512 1280x720 Parameters Model size (fp32) ENet 20.4 ms 32.9 ms 0.36 M 1.5 MB SegN

Timo Sämann 561 Jan 04, 2023
This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models This repository contains the code for our paper VDA (publ

RUCAIBox 13 Aug 06, 2022
Code for our ALiBi method for transformer language models.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation This repository contains the code and models for our paper Tra

Ofir Press 211 Dec 31, 2022
Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

4 Mar 11, 2022
Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we

Koutilya PNVR 23 Oct 19, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 08, 2022
NALSM: Neuron-Astrocyte Liquid State Machine

NALSM: Neuron-Astrocyte Liquid State Machine This package is a Tensorflow implementation of the Neuron-Astrocyte Liquid State Machine (NALSM) that int

Computational Brain Lab 4 Nov 28, 2022
Official codes: Self-Supervised Learning by Estimating Twin Class Distribution

TWIST: Self-Supervised Learning by Estimating Twin Class Distributions Codes and pretrained models for TWIST: @article{wang2021self, title={Self-Sup

Bytedance Inc. 85 Dec 15, 2022
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

causal-bald | Abstract | Installation | Example | Citation | Reproducing Results DUE An implementation of the methods presented in Causal-BALD: Deep B

OATML 13 Oct 07, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML

54 Aug 04, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023