DeepMind Perceiver (in PyTorch)

Disclaimer: This is not official and I'm not affiliated with DeepMind.

My implementation of the Perceiver: General Perception with Iterative Attention. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in models/mnist.pkl or by using perceiver.load_mnist_model(). It gets 96.02% on the test-data.

Getting started

To run this you need PyTorch installed:

pip3 install torch

From perceiver you can import Perceiver or PerceiverLogits.

Then you can use it as such (or look in examples.ipynb):

from perceiver import Perceiver

model = Perceiver(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

from perceiver import PerceiverLogits

model = PerceiverLogits(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    output_features, # <- How many different classes? E.g. 10 for MNIST.
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

To use my pre-trained MNIST model (not very good):

from perceiver import load_mnist_model

model = load_mnist_model()

TODO:

Positional embedding generalized to n dimensions (with fourier features)
Train other models (like CIFAR-100 or something not in the image domain)
Type indication
Unit tests for components of model
Package

My implementation of DeepMind's Perceiver

Related tags

Overview

DeepMind Perceiver (in PyTorch)

Getting started

TODO:

Owner

Louis Arge

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Python Algorithm Interview Book Review

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

U-Net: Convolutional Networks for Biomedical Image Segmentation

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Code for the paper "Attention Approximates Sparse Distributed Memory"

Implementation of a Transformer, but completely in Triton

Structural Constraints on Information Content in Human Brain States

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Ontologysim: a Owlready2 library for applied production simulation

Open-source Monocular Python HawkEye for Tennis

Recurrent Scale Approximation (RSA) for Object Detection

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

[CVPR 2021] MiVOS - Scribble to Mask module

High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

A Kaggle competition: discriminate gender based on handwriting

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer