Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Last update: Jan 06, 2023

Related tags

Deep Learning transformer-xl

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

A rule-based log analyzer & filter

Testing and Estimation of structural breaks in Stata

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Notification Triggers for Python

CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

Vision Transformer for 3D medical image registration (Pytorch).

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

DANet for Tabular data classification/ regression.

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules

Unified learning approach for egocentric hand gesture recognition and fingertip detection

SemEval2022 Patronizing and Condescending Language (PCL) Detection

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

MutualGuide is a compact object detector specially designed for embedded devices

Generalized Random Forests