Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Last update: Dec 15, 2022

Overview

This is a fork of Fairseq(-py) with implementations of the following models:

Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences.

Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models.

See README.

Efficient Wait-k Models for Simultaneous Machine Translation

Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths.

See README.

Fairseq Requirements and Installation

PyTorch version >= 1.4.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL

Installing Fairseq

git clone https://github.com/elbayadm/attn2d
cd attn2d
pip install --editable .

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

For Pervasive Attention, please cite:

@InProceedings{elbayad18conll,
    author ="Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob",
    title = "Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction",
    booktitle = "Proceedings of the 22nd Conference on Computational Natural Language Learning",
    year = "2018",
 }

For our wait-k models, please cite:

@article{elbayad20waitk,
    title={Efficient Wait-k Models for Simultaneous Machine Translation},
    author={Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob},
    journal={arXiv preprint arXiv:2005.08595},
    year={2020}
}

For Fairseq, please cite:

@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Related tags

Overview

Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Efficient Wait-k Models for Simultaneous Machine Translation

Fairseq Requirements and Installation

License

Citation

Owner

Maha

A demo for end-to-end English and Chinese text spotting using ABCNet.

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Entity Disambiguation as text extraction (ACL 2022)

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

A programming language with logic of Python, and syntax of all languages.

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

VoiceFixer VoiceFixer is a framework for general speech restoration.

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Chinese Grammatical Error Diagnosis

Edge-Augmented Graph Transformer

A single model that parses Universal Dependencies across 75 languages.

Code for the paper PermuteFormer

Transformers and related deep network architectures are summarized and implemented here.

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

TLA - Twitter Linguistic Analysis