DeepMind Perceiver (in PyTorch)

Disclaimer: This is not official and I'm not affiliated with DeepMind.

My implementation of the Perceiver: General Perception with Iterative Attention. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in models/mnist.pkl or by using perceiver.load_mnist_model(). It gets 96.02% on the test-data.

Getting started

To run this you need PyTorch installed:

pip3 install torch

From perceiver you can import Perceiver or PerceiverLogits.

Then you can use it as such (or look in examples.ipynb):

from perceiver import Perceiver

model = Perceiver(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

from perceiver import PerceiverLogits

model = PerceiverLogits(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    output_features, # <- How many different classes? E.g. 10 for MNIST.
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

To use my pre-trained MNIST model (not very good):

from perceiver import load_mnist_model

model = load_mnist_model()

TODO:

Positional embedding generalized to n dimensions (with fourier features)
Train other models (like CIFAR-100 or something not in the image domain)
Type indication
Unit tests for components of model
Package

My implementation of DeepMind's Perceiver

Related tags

Overview

DeepMind Perceiver (in PyTorch)

Getting started

TODO:

Owner

Louis Arge

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

salabim - discrete event simulation in Python

JudeasRx - graphical app for doing personalized causal medicine using the methods invented by Judea Pearl et al.

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

PyTorch implementation of UNet++ (Nested U-Net).

source code of “Visual Saliency Transformer” (ICCV2021)

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Unofficial PyTorch Implementation of "Augmenting Convolutional networks with attention-based aggregation"

Cross-Document Coreference Resolution

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Simple image captioning model - CLIP prefix captioning.

Codebase for Inducing Causal Structure for Interpretable Neural Networks