DeLighT: Very Deep and Light-Weight Transformers

Last update: Dec 18, 2022

Related tags

Overview

DeLighT: Very Deep and Light-weight Transformers

This repository contains the source code of our work on building efficient sequence models: DeFINE (ICLR'20) and DeLighT (preprint).

Table of contents

Overview
Requirements and installation
Training, evaluation, and results
Multiplication-addition operations
Citation
Acknowledgement
Issues

Overview

In this repository, we share the source code of our paper DeLight, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. For details, see our papers: DeFINE and and DeLighT.

Requirements and Installation

PyTorch version >= 1.4.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL
To use DeLighT, you need to install fairseq and develop locally:

git clone https://github.com/sacmehta/delight
cd delight
pip install --editable ./

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Training, Evaluation, and Results

For training, evaluation, and results, see below links. To ease reproduction of our results, we also provide links to training logs.

Neural machine translation

Language Modeling

WikiText-103

Multiplication-Addition Operations

We have added module profiling for both Transformer and DeLight networks. This can be enabled using --print-stats argument. A model summary will be printed (by default for 20 tokens), similar to below screenshot. To use larger sequence lengths for source and target for profiling statistics, you can use --src-len-ps and --tgt-len-ps flags.

Citation

If you find our work useful, please consider citing following works:

@misc{mehta2020delight,
    title={DeLighT: Very Deep and Light-weight Transformer},
    author={Sachin Mehta and Marjan Ghazvininejad and Srinivasan Iyer and Luke Zettlemoyer and Hannaneh Hajishirzi},
    year={2020},
    eprint={2008.00623},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@inproceedings{mehta2019define,
  title={DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling},
  author={Mehta, Sachin and Koncel-Kedziorski, Rik and Rastegari, Mohammad and Hajishirzi, Hannaneh},
  booktitle={International Conference on Learning Representations},
  year={2019}
}

Acknowledgements

We would like to thank Fairseq team for building easy-to-use sequence library.

Issues

Thanks for your interest in our work. For any issues, please raise a request.

DeLighT: Very Deep and Light-Weight Transformers

Related tags

Overview

DeLighT: Very Deep and Light-weight Transformers

Overview

Requirements and Installation

Training, Evaluation, and Results

Neural machine translation

Language Modeling

Multiplication-Addition Operations

Citation

Acknowledgements

Issues

Owner

Sachin Mehta

The guide to tackle with the Text Summarization

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Stanford CoreNLP provides a set of natural language analysis tools written in Java

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Practical Machine Learning with Python

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

A versatile token stream for handwritten parsers.

Anomaly Detection 이상치 탐지 전처리 모듈

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

AI_Assistant - This is a Python based Voice Assistant.

Spooky Skelly For Python

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

NLP applications using deep learning.

超轻量级bert的pytorch版本，大量中文注释，容易修改结构，持续更新

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech to text streamlit app