Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Nateve compiler developed with python.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

中文空间语义理解评测

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

A Streamlit web app that generates Rick and Morty stories using GPT2.

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

A fast, efficient universal vector embedding utility package.

NLP Text Classification

The tool to make NLP datasets ready to use

This is Assignment1 code for the Web Data Processing System.

Library for fast text representation and classification.

Script to download some free japanese lessons in portuguse from NHK

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

The first online catalogue for Arabic NLP datasets.

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

Creating an LSTM model to generate music