Attention for PyTorch with Linear Memory Footprint

Unofficially implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention (+ some sidekick speedup on the GPU when compared to reference implementation in JAX)

Usage:

git clone https://github.com/CHARM-Tx/linear_mem_attention_pytorch
cd linear_mem_attention_pytorch
python setup.py install

Usage:

High Level

from linear_mem_attention_torch.fast_attn import Attention

batch, length, features = 2, 2**8, 64
x, ctx = torch.randn(2, batch, length, features)
mask = torch.randn(batch, length) < 1.

attn = Attention(dim=features, heads = 8, dim_head = 64, bias=False)

# self-attn
v_self = attn(x, x, mask, query_chunk_size=1024, key_chunk_size=4096)

# cross-attn
v_cross = attn(x, ctx, mask, query_chunk_size=1024, key_chunk_size=4096)

Low level

from linear_mem_attention_torch import attention

batch, length, heads, features = 2, 2**8, 8, 64
mask = torch.randn(batch, length) < 1.
q, k, v = torch.randn(3, batch, length, heads, features)

v_ = attention(q, k, v, mask, query_chunk_size=1024, key_chunk_size=4096)

Benchmarks

See examples/example_benchamrk.ipynb for more information.

Citations:

@misc{rabe2021selfattention,
      title={Self-attention Does Not Need $O(n^2)$ Memory}, 
      author={Markus N. Rabe and Charles Staats},
      year={2021},
      eprint={2112.05682},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Attention for PyTorch with Linear Memory Footprint

Related tags

Overview

Attention for PyTorch with Linear Memory Footprint

Usage:

Usage:

High Level

Low level

Benchmarks

Citations:

Owner

On Evaluation Metrics for Graph Generative Models

Pytorch implementation of the unsupervised object discovery method LOST.

Differentiable Wavetable Synthesis

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

A Number Recognition algorithm

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

Multi-Person Extreme Motion Prediction

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

Reporting and Visualization for Hazardous Events

Project for tracking occupancy in Tel-Aviv parking lots.

Tensorflow2 Keras-based Semantic Segmentation Models Implementation

KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

This is the code used in the paper "Entity Embeddings of Categorical Variables".

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

A demonstration of using a live Tensorflow session to create an interactive face-GAN explorer.

Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Codebase for Diffusion Models Beat GANS on Image Synthesis.