TANL: Structured Prediction as Translation between Augmented Natural Languages

Related tags

Deep Learningtanl
Overview

TANL: Structured Prediction as Translation between Augmented Natural Languages

Code for the paper "Structured Prediction as Translation between Augmented Natural Languages" (ICLR 2021).

If you use this code, please cite the paper using the bibtex reference below.

@inproceedings{tanl,
    title={Structured Prediction as Translation between Augmented Natural Languages},
    author={Giovanni Paolini and Ben Athiwaratkun and Jason Krone and Jie Ma and Alessandro Achille and Rishita Anubhai and Cicero Nogueira dos Santos and Bing Xiang and Stefano Soatto},
    booktitle={9th International Conference on Learning Representations, {ICLR} 2021},
    year={2021},
}

Requirements

  • Python 3.6+
  • PyTorch (tested with version 1.7.1)
  • Transformers (tested with version 4.0.0)
  • NetworkX (tested with version 2.5, only used in coreference resolution)

You can install all required Python packages with pip install -r requirements.txt

Datasets

By default, datasets are expected to be in data/DATASET_NAME. Dataset-specific code is in datasets.py.

For example, the CoNLL04 and ADE datasets (joint entity and relation extraction) in the correct format can be downloaded using https://github.com/markus-eberts/spert/blob/master/scripts/fetch_datasets.sh. For other datasets, pre-processing and links are documented in the code.

Running the code

Use the following command: python run.py JOB

The JOB argument refers to a section of the config file, which by default is config.ini. A sample config file is provided, with settings that allow for a faster training and less memory usage than the settings used to obtain the final results in the paper.

For example, to replicate the paper's results on CoNLL04, have the following section in the config file:

[conll04_final]
datasets = conll04
model_name_or_path = t5-base
num_train_epochs = 200
max_seq_length = 256
max_seq_length_eval = 512
train_split = train,dev
per_device_train_batch_size = 8
per_device_eval_batch_size = 16
do_train = True
do_eval = False
do_predict = True
episodes = 1-10
num_beams = 8

Then run python run.py conll04_final. Note that the final results will differ slightly from the ones reported in the paper, due to small code changes and randomness.

Config arguments can be overwritten by command line arguments. For example: python run.py conll04_final --num_train_epochs 50.

Additional details

If do_train = True, the model is trained on the given train split (e.g., 'train') of the given datasets. The final weights and intermediate checkpoints are written in a directory such as experiments/conll04_final-t5-base-ep200-len256-b8-train, with one subdirectory per episode. Results in JSON format are also going to be saved there.

In every episode, the model is trained on a different (random) permutation of the training set. The random seed is given by the episode number, so that every episode always produces the same exact model.

Once a model is trained, it is possible to evaluate it without training again. For this, set do_train = False or (more easily) provide the -e command-line argument: python run.py conll04_final -e.

If do_eval = True, the model is evaluated on the 'dev' split. If do_predict = True, the model is evaluated on the 'test' split.

Arguments

The following are the most important command-line arguments for the run.py script. Run python run.py -h for the full list.

  • -c CONFIG_FILE: specify config file to use (default is config.ini)
  • -e: only run evaluation (overwrites the setting do_train in the config file)
  • -a: evaluate also intermediate checkpoints, in addition to the final model
  • -v : print results for each evaluation run
  • -g GPU: specify which GPU to use for evaluation

The following are the most important arguments for the config file. See the sample config file to understand the format.

  • datasets (str): comma-separated list of datasets for training
  • eval_datasets (str): comma-separated list of datasets for evaluation (default is the same as for training)
  • model_name_or_path (str): path to pretrained model or model identifier from huggingface.co/models (e.g. t5-base)
  • do_train (bool): whether to run training (default is False)
  • do_eval (bool): whether to run evaluation on the dev set (default is False)
  • do_predict (bool): whether to run evaluation on the test set (default is False)
  • train_split (str): comma-separated list of data splits for training (default is train)
  • num_train_epochs (int): number of train epochs
  • learning_rate (float): initial learning rate (default is 5e-4)
  • train_subset (float > 0 and <=1): portion of training data to effectively use during training (default is 1, i.e., use all training data)
  • per_device_train_batch_size (int): batch size per GPU during training (default is 8)
  • per_device_eval_batch_size (int): batch size during evaluation (default is 8; only one GPU is used for evaluation)
  • max_seq_length (int): maximum input sequence length after tokenization; longer sequences are truncated
  • max_output_seq_length (int): maximum output sequence length (default is max_seq_length)
  • max_seq_length_eval (int): maximum input sequence length for evaluation (default is max_seq_length)
  • max_output_seq_length_eval (int): maximum output sequence length for evaluation (default is max_output_seq_length or max_seq_length_eval or max_seq_length)
  • episodes (str): episodes to run (default is 0; an interval can be specified, such as 1-4; the episode number is used as the random seed)
  • num_beams (int): number of beams for beam search during generation (default is 1)
  • multitask (bool): if True, the name of the dataset is prepended to each input sentence (default is False)

See arguments.py and transformers.TrainingArguments for additional config arguments.

Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

Kai Ninomiya 5 Jul 18, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Symplectic Double Pendulum Simulator Double pendulum simulator using a symplectic Euler's method. The program calculates the momentum and position of

Scott Marino 1 Jan 12, 2022
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Tracy (Shengmin) Tao 1 Apr 12, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 25 Sep 06, 2022
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 08, 2023
Jaxtorch (a jax nn library)

Jaxtorch (a jax nn library) This is my jax based nn library. I created this because I was annoyed by the complexity and 'magic'-ness of the popular ja

nshepperd 17 Dec 08, 2022
A Simulated Optimal Intrusion Response Game

Optimal Intrusion Response An OpenAI Gym interface to a MDP/Markov Game model for optimal intrusion response of a realistic infrastructure simulated u

Kim Hammar 10 Dec 09, 2022
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Rishikesh (ऋषिकेश) 69 Dec 17, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

5 Jun 27, 2022
Automatic Number Plate Recognition using Contours and Convolution Neural Networks (CNN)

Cite our paper if you find this project useful https://www.ijariit.com/manuscripts/v7i4/V7I4-1139.pdf Abstract Image processing technology is used in

Adithya M 2 Jun 28, 2022
Neuralnetwork - Basic Multilayer Perceptron Neural Network for deep learning

Neural Network Just a basic Neural Network module Usage Example Importing Module

andreecy 0 Nov 01, 2022
Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

3 May 12, 2022
FNet Implementation with TensorFlow & PyTorch

FNet Implementation with TensorFlow & PyTorch. TensorFlow & PyTorch implementation of the paper "FNet: Mixing Tokens with Fourier Transforms". Overvie

Abdelghani Belgaid 1 Feb 12, 2022
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
Official NumPy Implementation of Deep Networks from the Principle of Rate Reduction (2021)

Deep Networks from the Principle of Rate Reduction This repository is the official NumPy implementation of the paper Deep Networks from the Principle

Ryan Chan 49 Dec 16, 2022