Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Related tags

Text Data & NLPBPT
Overview

BP-Transformer

This repo contains the code for our paper

BP-Transformer: Modeling Long-Range Context via Binary Partition

Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

The code is written in DGL with PyTorch as backend.

Requirements

  • torchtext 0.4
  • dgl 0.4 (the code on master branch is not compatible with dgl 0.5, please checkout develop branch for dgl 0.5 compatible version).
  • yaml
  • spacy
  • PyTorch 1.1+

Usage

For Multi-GPU training, please export NCCL_LL_THRESHOLD=0 before running scripts because of a PyTorch bug mentioned here.

The codebase has two dependencies: graph_kernel and graph_builder, the first one is for efficient graph attention on GPU with node parallel strategy written in CUDA, the second one is for efficient graph construction written in Cython. To install them:

cd graph_builder
python setup.py install
cd ..
cd graph_kernel
python setup.py install
cd ..

We support the following tasks with BPT as backbone:

  • Text Classification: text_classification.py
  • Language Modeling: lm.py
  • Machine Translation: mt.py
  • Natural Language Inference: nli.py

All experiment settings mentioned in our paper are available at configs/.

python *.py --config configs/*.yml --gpu [GPUs]

Note that this repo does not contain any data files, to get dataset required for experiments, run . get_*.sh and the corresponding dataset would be downloaded and preprocessed.

For machine translation, we have another script mt_infer.py for decoding:

python mt_infer.py --config configs/*.yml --gpu [GPU]

Before decoding, please make sure you have finished the training using mt.py with the same config file.

NOTE: Currently we do not support CPU training/inference.

Visualization

Following is the visualization of the sparse matrix of BPT underlying graph when sequence length is 8192 and k is 4. image

Results

  • Character-Level Language Modeling (enwik8, metric: bpc), 12 layers.
    • BPT(context length=8192): 1.02
    • Adaptive Transformer: 1.02
    • Transformer-XL: 1.06
    • To reproduce: python lm.py --config configs/enwik8-8192.yml --gpu 0,1,2,3,4,5,6,7
  • Document-Level Machine Translation (IWSLT 2015 Zh-En, metric: BLEU), base setting.
    • BPT(context length=64): 19.84
    • HAN-NMT: 17.68
    • To reproduce: python mt.py --config configs/iwslt-4-64.yml --gpu 0
  • Text Classification (IMDB, metric: accuracy), 5 layers.
    • BPT+GloVe: 92.12(±0.11)
    • LSTM+CoVe: 91.8
    • Transformer+Glove: 89.24(±0.20)
    • Star Transformer: 90.50
    • To reproduce: python text_classification.py --config configs/imdb-4.yml --gpu 0
      • Note that our CUDA kernel uses atomic operations which may result in non-determinism, we report the mean and std of accuracy in multiple(10) runs.
      • The IMDB dataset has not official train/dev split, we follow the setting of Bryan et al., 2017 and hold out 10% samples for validation. We report the test accuracy of model with best valid loss.

For sentence level modeling, we show that BPT models better inductive bias than vanilla transformer by attending fine-grained features of neighbors and coarse-grained features of far-away tokens.

  • Machine Translation(WMT14 En-De, metric: BLEU), base setting.
    • BPT(k=1): 26.9
    • BPT(k=2): 27.4
    • BPT(k=4): 27.6
    • BPT(k=8): 26.7
    • Transformer-base(our implementation): 27.2
    • To reproduce: python mt.py --config configs/wmt-*.yml --gpu 0,1,2,3,4,5,6,7
      • We report SacreBLEU result for reproducibility (setting: BLEU+c.mixed+l.en-de+#.1+s.exp+t.wmt14+tok.intl+v.1.4.1), the sacrebleu score is usually lower than that produced by get_ende_bleu.sh script in tensor2tensor as described here.
  • Natural Language Inference(SNLI, metric: accuracy), ESIM-like structure, 3 layers for self-attention and 3 layers for cross-sentence attention.
    • BPT(k=4): 88.25(±0.07)
    • Transformer: 87.89(±0.31)
    • To reproduce: python nli.py --config configs/snli.yml --gpu 0
      • Like Text Classification, the result on NLI is also not stable because of randomness in our CUDA kernel, we report the mean and std of accuracy in multiple(7) runs.
  • Text Classification(SST-5, metric: accuracy), 4 layers.
    • BPT+GloVe: 52.71(±0.32)
    • Transformer+GloVe: 50.40
    • Tree-LSTM+GloVe: 51.0
    • To reproduce: python text_classification.py --config configs/sst5-2.yml --gpu 0

TODOs

  • FP16 support (mixed-precision training/inference)
  • Integrate kernels with dgl 0.5
  • CPU support
Owner
Zihao Ye
Ph.D. [email protected] of Washington, focusing on Compilers and Computer Arch
Zihao Ye
本插件是pcrjjc插件的重置版,可以独立于后端api运行

pcrjjc2 本插件是pcrjjc重置版,不需要使用其他后端api,但是需要自行配置客户端 本项目基于AGPL v3协议开源,由于项目特殊性,禁止基于本项目的任何商业行为 配置方法 环境需求:.net framework 4.5及以上 jre8 别忘了装jre8 别忘了装jre8 别忘了装jre8

132 Dec 26, 2022
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 04, 2023
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 07, 2023
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

30 Aug 29, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

molten A minimal, extensible, fast and productive API framework for Python 3. Changelog: https://moltenframework.com/changelog.html Community: https:/

3.2k Dec 28, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 09, 2023
A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

Hydra-Auto-Hack A python script that will use hydra to get user and password to login to ssh, ftp, and telnet Project Description This python script w

2 Jan 16, 2022
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 04, 2023
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
A Telegram bot to add notes to Flomo.

flomo bot 使用 Telegram 机器人发送笔记到你的 Flomo. 你需要有一台可访问 Telegram 的服务器。 Steps @BotFather 新建机器人,获取 token Flomo 官网获取 API,链接 https://flomoapp.com/mine?source=in

Zhen 44 Dec 30, 2022
pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

파이썬 비트코인 투자 자동화 강의 코드 by 유튜브 조코딩 채널 pyupbit 라이브러리를 활용하여 upbit 거래소에서 비트코인 자동매매를 하는 코드입니다. 파일 구성 test.py : 잔고 조회 (1강) backtest.py : 백테스팅 코드 (2강) bestK.p

조코딩 JoCoding 186 Dec 29, 2022
使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SimCSE复现 项目描述 SimCSE是一种简单但是很巧妙的NLP对比学习方法,创新性地引入Dropout的方式,对样本添加噪声,从而达到对正样本增强的目的。 该框架的训练目的为:对于batch中的每个样本,拉近其与正样本之间的距离,拉远其与负样本之间的距离,使得模型能够在大规模无监督语料(也可以

58 Dec 20, 2022
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Rachford-Rice Contest This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest. Can you solve the Rachford-Rice problem for all t

13 Sep 20, 2022
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

Jonathan Besomi 2.7k Jan 08, 2023
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Quickly train T5 models in just 3 lines of code + ONNX support simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quic

Shivanand Roy 220 Dec 30, 2022
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022