Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Overview

Linear Multihead Attention (Linformer)

PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity), which demonstrates that the self-attention mechanism can be approximated by a low-rank matrix and reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.

Implementation

This is an efficient implementation followed with the PyTorch official torch.nn.MultiheadAttention class and F.multi_head_attention_forward function.

Three additional argments defined in LinearMultiheadAttention: sequence length, the projected dimention k and the parameter sharing.

seq_len: the sequence length. Default: 100.
proj_k: the projected dimention `k` in Linformer paper. Default: 128.
param_sharing: parameter sharing mode: layerwise, none. headwise is not implemented. Default: none.

Usage

Examples of using torch.nn.MultiheadAttention:

>>> import torch
>>> multihead_attn = torch.nn.MultiheadAttention(embed_dim, num_heads)
>>> attn_output, attn_output_weights = multihead_attn(query, key, value)

Examples of using LinearMultiheadAttention:

>>> from linear_multihead_attention import LinearMultiheadAttention
>>> multihead_attn = LinearMultiheadAttention(embed_dim, num_heads) 
>>> attn_output, attn_output_weights = multihead_attn(query, key, value)

Examples of using LinearMultiheadAttention with the sequence length of 512 and :

>>> from linear_multihead_attention import LinearMultiheadAttention
>>> multihead_attn = LinearMultiheadAttention(embed_dim, num_heads, seq_len=512, proj_k=256, param_sharing='layerwise') 
>>> attn_output, attn_output_weights = multihead_attn(query, key, value)

Linear-DETR: Replace torch.nn.MultiheadAttention in DETR with LinearMultiheadAttention in three lines in models/transformer.py, it saved much more memory and space, hope to have a comparable performance:

from linear_multihead_attention import LinearMultiheadAttention

# TransformerEncoderLayer
# self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, seq_len=w*h, proj_k=64) # where w, h are from `bs, c, h, w = src.shape`


# TransformerDecoderLayer
# self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
# self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)

self.self_attn = LinearMultiheadAttention(d_model, nhead, dropout=dropout, seq_len=num_queries, proj_k=64) # where num_queries = args.num_queries
self.multihead_attn = LinearMultiheadAttention(d_model, nhead, dropout=dropout, seq_len=w*h, proj_k=64) # where w, h are from `bs, c, h, w = src.shape`

Results on DETR

TODO

Citation

@misc{wang2020linformer,
    title={Linformer: Self-Attention with Linear Complexity},
    author={Sinong Wang and Belinda Z. Li and Madian Khabsa and Han Fang and Hao Ma},
    year={2020},
    eprint={2006.04768},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Owner
Kui Xu
Researcher, interested in Computational Biology, and 3D Computer Vision.
Kui Xu
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch=1.6.0,=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

KDD 483 Jan 04, 2023
Host your own GPT-3 Discord bot

GPT3 Discord Bot Host your own GPT-3 Discord bot i'd host and make the bot invitable myself, however GPT3 terms of service prohibit public use of GPT3

[something hillarious here] 8 Jan 07, 2023
This program do translate english words to portuguese

Python-Dictionary This program is used to translate english words to portuguese. Web-Scraping This program use BeautifulSoap to make web scraping, so

João Assalim 1 Oct 10, 2022
A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 05, 2023
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 07, 2023
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 02, 2021
1 Jun 28, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
NLP: SLU tagging

NLP: SLU tagging

北海若 3 Jan 14, 2022
FireFlyer Record file format, writer and reader for DL training samples.

FFRecord The FFRecord format is a simple format for storing a sequence of binary records developed by HFAiLab, which supports random access and Linux

77 Jan 04, 2023
Utilities for preprocessing text for deep learning with Keras

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process

Hamel Husain 180 Dec 09, 2022
Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

Subhadeep Mandal 1 Feb 01, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

Sachin Mehta 440 Dec 18, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

ZYMa 12 Apr 07, 2022