A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Last update: Nov 07, 2022

Related tags

Overview

Unofficial PyTorch implementation of Luna: Linear Unified Nested Attention

The quadratic computational and memory complexities of the Transformer’s attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear (as opposed to quadratic) time and space complexity. As compared to a more traditional attention mechanism, Luna introduces an additional sequence with a fixed length as input and an additional corresponding output, which allows Luna to perform attention operation linearly, while also storing adequate contextual information. We perform extensive evaluations on three benchmarks of sequence modeling tasks: long-context sequence modeling, neural machine translation and masked language modeling for large-scale pretraining. Competitive or even better experimental results demonstrate both the effectiveness and efficiency of Luna compared to a variety of strong baseline methods including the full-rank attention and other efficient sparse and dense attention methods

Installation

This project recommends Python 3.7 or higher. We recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the following commands:

pip install -e .

Usage

import torch
from luna_transformer import LunaTransformerEncoder

DUMMY_INPUTS = torch.LongTensor([
    [2, 3, 3, 3, 3, 3, 2, 2, 0],
    [2, 3, 3, 3, 3, 3, 2, 3, 2],
    [2, 3, 3, 3, 3, 3, 2, 2, 0],
])
DUMMY_INPUT_LENGTHS = torch.LongTensor([9, 8, 7])

model = LunaTransformerEncoder(vocab_size=4, d_model=512, num_layers=6,
                               num_attention_heads=8, project_embedding_length=32,
                               dropout_p=0.1, max_length=1024)
ouputs = model(DUMMY_INPUTS, DUMMY_INPUT_LENGTHS)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Author

Soohwan Kim @sooftware
Contacts: [email protected]

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Related tags

Overview

Installation

Prerequisites

Install from source

Usage

Troubleshoots and Contributing

Code Style

Author

Owner

Soohwan Kim

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

🍷 Gracefully claim weekly free games and monthly content from Epic Store.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Generate images from texts. In Russian. In PaddlePaddle

Exporter for Storage Area Network (SAN)

Rank1 Conversation Emotion Detection Task

Six - a Python 2 and 3 compatibility library

Pytorch implementation of Zero-DCE++

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

Deep learning models for change detection of remote sensing images

automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

A library for answering questions using data you cannot see

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Python interface for SmartRF Sniffer 2 Firmware

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Transformer - Transformer in PyTorch

Fast and customizable reconnaissance workflow tool based on simple YAML based DSL.

Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Related tags

Overview

Installation

Prerequisites

Install from source

Usage

Troubleshoots and Contributing

Code Style

Author

Owner

Soohwan Kim

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

🍷 Gracefully claim weekly free games and monthly content from Epic Store.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Generate images from texts. In Russian. In PaddlePaddle

Exporter for Storage Area Network (SAN)

Rank1 Conversation Emotion Detection Task

Six - a Python 2 and 3 compatibility library

Pytorch implementation of Zero-DCE++

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

Deep learning models for change detection of remote sensing images

automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

A library for answering questions using data you cannot see

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Python interface for SmartRF Sniffer 2 Firmware

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Transformer - Transformer in PyTorch

Fast and customizable reconnaissance workflow tool based on simple YAML based DSL.

Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.