Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Last update: Dec 29, 2022

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a simple method to allow transformers to attend to memories of the past efficiently. Original Jax repository

Install

$ pip install htm-pytorch

Usage

import torch
from htm_pytorch import HTMAttention

attn = HTMAttention(
    dim = 512,
    heads = 8,               # number of heads for within-memory attention
    dim_head = 64,           # dimension per head for within-memory attention
    topk_mems = 8,           # how many memory chunks to select for
    mem_chunk_size = 32,     # number of tokens in each memory chunk
    add_pos_enc = True       # whether to add positional encoding to the memories
)

queries = torch.randn(1, 128, 512)     # queries
memories = torch.randn(1, 20000, 512)  # memories, of any size
mask = torch.ones(1, 20000).bool()     # memory mask

attended = attn(queries, memories, mask = mask) # (1, 128, 512)

If you want the entire HTM Block (which contains the layernorm for the input followed by a skip connection), just import HTMBlock instead

import torch
from htm_pytorch import HTMBlock

block = HTMBlock(
    dim = 512,
    topk_mems = 8,
    mem_chunk_size = 32
)

queries = torch.randn(1, 128, 512)
memories = torch.randn(1, 20000, 512)
mask = torch.ones(1, 20000).bool()

out = block(queries, memories, mask = mask) # (1, 128, 512)

Citations

@misc{lampinen2021mental,
    title   = {Towards mental time travel: a hierarchical memory for reinforcement learning agents}, 
    author  = {Andrew Kyle Lampinen and Stephanie C. Y. Chan and Andrea Banino and Felix Hill},
    year    = {2021},
    eprint  = {2105.14039},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

78 Jan 7, 2023

Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

123 Nov 17, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

7 Nov 6, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment

55 Nov 23, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Comments

auto-regressive use case
Hi Phil! I was wondering if HTM part can be used in/for auto-regressive scenario? Full proposed arch in the paper has 3 blocks:

Self Att - this can be easily done with causal masking

next we have HTM block with memories - can it be used in autoregressive scenario i wonder?

Feed Forward block

please let me know your thoughts?
opened by inspirit 0

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Related tags

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Install

Usage

Citations

You might also like...

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Comments

auto-regressive use case

Releases(0.0.4)

0.0.4(Sep 15, 2021)

0.0.3(Sep 14, 2021)

0.0.2(Sep 14, 2021)

0.0.1(Sep 14, 2021)

Owner

Phil Wang

A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Segmentation Training Pipeline

GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

Distributed Evolutionary Algorithms in Python

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

Fuzzing the Kernel Using Unicornafl and AFL++

A Light CNN for Deep Face Representation with Noisy Labels

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT