A curated list of efficient attention modules

Overview

awesome-fast-attention Awesome

A curated list of efficient attention modules (last update: Wed, 10 Mar 2021 23:52:22 +0000)

Table of Contents

Efficient Attention

Paper (citations) Implementation Computational Complexity AutoRegressive Main Idea
Generating Wikipedia by Summarizing Long Sequences (282) memory-compressed-attention formula ✔️
EXPAND

compresses key and value + blocked attention

CBAM: Convolutional Block Attention Module (999+) attention-module formula
EXPAND

combines the SE attention with a per pixel(local) weight

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (16) set_transformer formula
EXPAND

uses K relay nodes

CCNet: Criss-Cross Attention for Semantic Segmentation (296) CCNet formula
EXPAND

each pixel attends to its row and column simultaneously

Efficient Attention: Attention with Linear Complexities (16) efficient-attention formula
EXPAND

Softmax(Q)*(Softmax(K^T)*V)

Star-Transformer (40) fastNLP formula
EXPAND

uses a relay(global) node and attends to/from that node

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (199) GCNet formula
EXPAND

squeeze and excitation with an attention pooling (instead of a GAP)

Generating Long Sequences with Sparse Transformers (257) DeepSpeed formula ✔️
EXPAND

sparse block based attention

SCRAM: Spatially Coherent Randomized Attention Maps (1) - formula ✔️
EXPAND

uses PatchMatch to find close keys

Interlaced Sparse Self-Attention for Semantic Segmentation (24) IN_PAPER formula ✔️
EXPAND

combination of a short length and then long range(dilated) attention

Permutohedral Attention Module for Efficient Non-Local Neural Networks (3) Permutohedral_attention_module formula
EXPAND

uses permutohedral lattice approximation algorithm to approximate the attention output

Large Memory Layers with Product Keys (43) XLM formula ✔️
EXPAND

search for nearest neighbor keys

Expectation-Maximization Attention Networks for Semantic Segmentation (79) EMANet formula
EXPAND

applys expectation maximization to cluster keys into k clusters

BP-Transformer: Modelling Long-Range Context via Binary Partitioning (15) BPT formula ✔️
EXPAND

attends to distant tokens coarsely and attends to close tokens in a more fine-grained manner

Compressive Transformers for Long-Range Sequence Modelling (48) compressive-transformer-pytorch formula ✔️
EXPAND

compresses distant tokens instead of just stop_grad() ing them, more efficient version of transformerXL

Axial Attention in Multidimensional Transformers (36) axial-attention formula ✔️
EXPAND

apply attention on each axis separately

Reformer: The Efficient Transformer (216) trax formula ✔️
EXPAND

uses LSH to find close keys

Sparse Sinkhorn Attention (16) sinkhorn-transformer formula ✔️
EXPAND

uses a cost matrix to limit attention between buckets

Transformer on a Diet (2) transformer-on-diet formula ✔️
EXPAND

dilated transformer like wavenet

Time-aware Large Kernel Convolutions (9) TaLKConvolutions formula ✔️
EXPAND

calculate mean over a dynamic subsequence around each token with the help of summed-area table

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection (2) - formula ✔️
EXPAND

learns the q, k connections == dynamically creates a sparse attention matrix

Efficient Content-Based Sparse Attention with Routing Transformers (38) routing-transformer formula ✔️
EXPAND

computes attention with same-cluster tokens (computed by online k-means)

Neural Architecture Search for Lightweight Non-Local Networks (11) AutoNL formula
EXPAND

computes Q(KV) and also down samples q, k, v both in spatial and channel dimensions

Longformer: The Long-Document Transformer (159) longformer formula ✔️
EXPAND

global + blocked attention

ETC: Encoding Long and Structured Inputs in Transformers (16) - formula
EXPAND

combines global attention (star transformer with multiple global tokens) with local attention

Multi-scale Transformer Language Models (2) IN_PAPER formula ✔️
EXPAND

UNet like + retina attetion is something close to BP-Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models (26) Synthesizer-Rethinking-Self-Attention-Transformer-Models formula ✔️
EXPAND

does not compute pairwise interactions

Jukebox: A Generative Model for Music (45) jukebox formula ✔️
EXPAND

better attention patterns from Sparse Transformer

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers (0) - formula ✔️
EXPAND

does not compute pairwise interactions and uses fixed mask patters

GMAT: Global Memory Augmentation for Transformers (2) gmat formula
EXPAND

adds global tokens

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (45) fast-transformers formula ✔️
EXPAND

uses phi(q)(phi(k)v) and also improves the sequential sampling step

Linformer: Self-Attention with Linear Complexity (47) linformer-pytorch formula
EXPAND

project key and value from nd to kd

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers (8) google-research formula ✔️
EXPAND

calculate an unbiased stochastic approximation of the attention matrix

Kronecker Attention Networks (1) kronecker-attention-pytorch formula
EXPAND

uses horizontal and lateral average matrices

Real-time Semantic Segmentation with Fast Attention (5) - formula
EXPAND

l2_norm(q)*(l2_norm(k)*v)

Fast Transformers with Clustered Attention (6) fast-transformers formula
EXPAND

groups queries together with LSH

Big Bird: Transformers for Longer Sequences (60) DeepSpeed formula
EXPAND

ETC with random connections

Tensor Low-Rank Reconstruction for Semantic Segmentation (3) - formula
EXPAND

decompose the full attention tensor into rank one tensors (CP decomposition)

Looking for change? Roll the Dice and demand Attention (0) IN_PAPER formula
EXPAND

uses the fractal tanimoto similarity to compare queries with keys inside the attention module

Rethinking Attention with Performers (30) google-research formula ✔️
EXPAND

unbiased approximation of the attention matrix with softmax kernel

Memformer: The Memory-Augmented Transformer (0) memformer formula ✔️
EXPAND

attend to memory slots + Memory-Replay BackPropagation

SMYRF: Efficient Attention using Asymmetric Clustering (1) smyrf formula
EXPAND

LSH with balanced clusters

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (0) Informer2020 formula ✔️
EXPAND

sparse attention + funnel like encoder

Sub-Linear Memory: How to Make Performers SLiM (0) google-research formula ✔️
EXPAND

Performer but with sublinear Memory usage

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (0) Nystromformer formula
EXPAND

uses Nystrom method to approximate the attention matrix

Linear Transformers Are Secretly Fast Weight Memory Systems (0) fast-weight-transformers formula ✔️
EXPAND

show that linear transformers are basically fast weight networks + propose a new kernel function to linearise attention, balancing simplicity and effectiveness

LambdaNetworks: Modeling Long-Range Interactions Without Attention (6) lambda-networks formula ✔️
EXPAND

generates a linear layer based on context + decouple pos/context

Random Feature Attention (2) - formula ✔️
EXPAND

kernel approximation and also transformers are rnn

Articles/Surveys/Benchmarks

Owner
Sepehr Sameni
PhD Candidate at the University of Bern, Computer Vision Group
Sepehr Sameni
NLP - Machine learning

Flipkart-product-reviews NLP - Machine learning About Product reviews is an essential part of an online store like Flipkart’s branding and marketing.

Harshith VH 1 Oct 29, 2021
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022
NLP codes implemented with Pytorch (w/o library such as huggingface)

NLP_scratch NLP codes implemented with Pytorch (w/o library such as huggingface) scripts ├── models: Neural Network models ├── data: codes for dataloa

3 Dec 28, 2021
TLA - Twitter Linguistic Analysis

TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-

Tushar Sarkar 47 Aug 14, 2022
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Akuchi 36 Dec 18, 2022
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

fastNLP 48 Dec 14, 2022
This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers.

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Chatbot for the Chatango messaging platform

BroiestBot The baddest bot in the game right now. Uses the ch.py framework for joining Chantango rooms and responding to user messages. Commands If a

Todd Birchard 3 Jan 17, 2022
Chinese NER with albert/electra or other bert descendable model (keras)

Chinese NLP (albert/electra with Keras) Named Entity Recognization Project Structure ./ ├── NER │   ├── __init__.py │   ├── log

2 Nov 20, 2022
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

NeoDaysPlus Reduced contrast, expanded, and continuously developed version of the CDDA tileset NeoDays that's being completed with new sprites for mis

0 Nov 12, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Korea Spell Checker

한국어 문서 koSpellPy Korean Spell checker How to use Install pip install kospellpy Use from kospellpy import spell_init spell_checker = spell_init() # d

kangsukmin 2 Oct 20, 2021
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

Tower 1 Nov 20, 2021
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

Aymen Berriche 8 Apr 13, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021