Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Overview

CAPE 🌴 pylint pytest

PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!

Setup 🔧

Minimum requirements:

torch >= 1.10.0

Install from source:

git clone https://github.com/gcambara/cape.git
cd cape
pip install --editable ./

Usage 📖

Ready to go along with PyTorch's official implementation of Transformers. Default initialization behaves identically as sinusoidal positional embeddings, summing them up to your content embeddings:

from torch import nn
from cape import CAPE1d

pos_emb = CAPE1d(d_model=512)
transformer = nn.Transformer(d_model=512)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x) # forward sums the positional embedding by default
x = transformer(x)

Alternatively, you can get positional embeddings separately

x = torch.randn(10, 32, 512)
pos_emb = pos_emb.compute_pos_emb(x)

scale = 512**0.5
x = (scale * x) + pos_emb
x = transformer(x)

Let's see a few examples of CAPE initialization for different modalities, inspired by the original paper experiments.

CAPE for text 🔤

CAPE1d is ready to be applied to text. Keep max_local_shift between 0 and 0.5 to shift local positions without disordering them.

from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=5.0, 
                 max_local_shift=0.5, max_global_scaling=1.03, 
                 normalize=False)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x)

Padding is supported by indicating the length of samples in the forward method, with the x_lengths argument. For example, the original length of samples is 7, although they have been padded to sequence length 10.

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x_lengths = torch.ones(32)*7
x = pos_emb(x, x_lengths=x_lengths)

CAPE for audio 🎙️

CAPE1d for audio is applied similarly to text. Use positions_delta argument to set the separation in seconds between time steps, and x_lengths for indicating sample durations in case there is padding.

For instance, let's consider no padding and same hop size (30 ms) at every sample in the batch:

# Max global shift is 60 s.
# Max local shift is set to 0.5 to maintain positional order.
# Max global scaling is 1.1, according to WSJ recipe.
# Freq scale is 30 to ensure that 30 ms queries are possible with long audios
from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=60.0, 
                 max_local_shift=0.5, max_global_scaling=1.1, 
                 normalize=True, freq_scale=30.0)

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats
positions_delta = 0.03 # 30 ms of stride
x = pos_emb(x, positions_delta=positions_delta)

Now, let's imagine that the original duration of all samples is 2.5 s, although they have been padded to 3.0 s. Hop size is 30 ms for every sample in the batch.

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats

duration = 2.5
positions_delta = 0.03
x_lengths = torch.ones(32)*duration
x = pos_emb(x, x_lengths=x_lengths, positions_delta=positions_delta)

What if the hop size is different for every sample in the batch? E.g. first half of the samples have stride of 30 ms, and the second half of 50 ms.

positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300,
        0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500])

Lastly, let's consider a very rare case, where hop size is different for every sample in the batch, and is not constant within some samples. E.g. stride of 30 ms for the first half of samples, and 50 ms for the second half. However, the hop size of the very first sample linearly increases for each time step.

from einops import repeat
positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
positions_delta = repeat(positions_delta, 'b -> b new_axis', new_axis=100)
positions_delta[0, :] *= torch.arange(1, 101)
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([[0.0300, 0.0600, 0.0900,  ..., 2.9400, 2.9700, 3.0000],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        ...,
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500]])

CAPE for ViT 🖼️

CAPE2d is used for embedding positions in image patches. Scaling of positions between [-1, 1] is done within the module, whether patches are square or non-square. Thus, set max_local_shift between 0 and 0.5, and the scale of local shifts will be adjusted according to the height and width of patches. Beyond values of 0.5 the order of positions might be altered, do this at your own risk!

from cape import CAPE2d
pos_emb = CAPE2d(d_model=512, max_global_shift=0.5, 
                 max_local_shift=0.5, max_global_scaling=1.4)

# Case 1: square patches
x = torch.randn(16, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

# Case 2: non-square patches
x = torch.randn(24, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

Citation ✍️

I just did this PyTorch implementation following the paper's Python code and the Flashlight recipe in C++. All the credit goes to the original authors, please cite them if you use this for your research project:

@inproceedings{likhomanenko2021cape,
title={{CAPE}: Encoding Relative Positions with Continuous Augmented Positional Embeddings},
author={Tatiana Likhomanenko and Qiantong Xu and Gabriel Synnaeve and Ronan Collobert and Alex Rogozhnikov},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=n-FqqWXnWW}
}

Acknowledgments 🙏

Many thanks to the paper's authors for code reviewing and clarifying doubts about the paper and the implementation. :)

You might also like...
Implementation of
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Rotary Embeddings - Pytorch A standalone library for adding rotary embeddings to transformers in Pytorch, following its success as relative positional

A PyTorch Implementation of
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

PyTorch implementation of the NIPS-17 paper
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Styled Augmented Translation
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

TANL: Structured Prediction as Translation between Augmented Natural Languages

TANL: Structured Prediction as Translation between Augmented Natural Languages Code for the paper "Structured Prediction as Translation between Augmen

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Releases(v1.0.0)
Owner
Guillermo Cámbara
🎙️ PhD Candidate in Self-Supervised Learning + Speech Recognition @ Universitat Pompeu Fabra & Telefónica Research
Guillermo Cámbara
This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

Learning-to-See-in-the-Dark This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018, by Chen Chen, Qifeng Chen, Jia Xu, and Vl

5.3k Jan 01, 2023
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
RP-GAN: Stable GAN Training with Random Projections

RP-GAN: Stable GAN Training with Random Projections This repository contains a reference implementation of the algorithm described in the paper: Behna

Ayan Chakrabarti 20 Sep 18, 2021
“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品

“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品,并且能够返回完整地购物清单及顾客应付的实际商品总价格,极大地降低零售行业实际运营过程中巨大的人力成本,提升零售行业无人化、自动化、智能化水平。

thomas-yanxin 192 Jan 05, 2023
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

FaceMorpher FaceMorpher is an innovative project to get a unique face morph (or interpolation for geeks) on a website. Yes, this means you can see fac

Anish 9 Jun 24, 2022
This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Jump Reward Inference for 1D Music Rhythmic State Spaces An implementation of the probablistic jump reward inference model for music rhythmic informat

Mojtaba Heydari 25 Dec 16, 2022
A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

This project is a web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks. Thanks for NVlabs' excelle

K.L. 150 Dec 15, 2022
Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

Head-and-Neck-Tumour-Segmentation-and-Prediction-of-Patient-Survival Welcome to the Head and Neck Tumour Segmentation and Prediction of Patient Surviv

5 Oct 20, 2022
Official implementation of Sparse Transformer-based Action Recognition

STAR Official implementation of S parse T ransformer-based A ction R ecognition Dataset download NTU RGB+D 60 action recognition of 2D/3D skeleton fro

Chonghan_Lee 15 Nov 02, 2022
Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution

FMEN Lowest memory consumption and second shortest runtime in NTIRE 2022 on Efficient Super-Resolution. Our paper: Fast and Memory-Efficient Network T

33 Dec 01, 2022
Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021) In this repository we provide PyTorch implementations for GeMCL; a

4 Apr 15, 2022
UniFormer - official implementation of UniFormer

UniFormer This repo is the official implementation of "Uniformer: Unified Transf

SenseTime X-Lab 573 Jan 04, 2023
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Vehicle speed detection with python

Vehicle-speed-detection In the project simulate the tracker.py first then simulate the SpeedDetector.py. Finally, a new window pops up and the output

3 Dec 15, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 04, 2023
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
Drone Task1 - Drone Task1 With Python

Drone_Task1 Matching Results 3.mp4 1.mp4

MLV Lab (Machine Learning and Vision Lab at Korea University) 11 Nov 14, 2022