Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Last update: Jan 06, 2023

Related tags

Deep Learning transformer-xl

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

CS583: Deep Learning

Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Super Pix Adv - Offical implemention of Robust Superpixel-Guided Attentional Adversarial Attack (CVPR2020)

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

A dataset for online Arabic calligraphy

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Plug and play transformer you can find network structure and official complete code by clicking List

A texturizer that I just made. Nothing special here.

Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

ivadomed is an integrated framework for medical image analysis with deep learning.

Attentive Implicit Representation Networks (AIR-Nets)

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds (Local-Lip)

v objective diffusion inference code for JAX.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space