PyTorch implementation of Pay Attention to MLPs

Last update: Dec 13, 2022

Overview

gMLP

PyTorch implementation of Pay Attention to MLPs.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/g-mlp.git

Navigate to the cloned directory. You can use the barebone gMLP model via

>>> from g_mlp import gMLP
>>> model = gMLP()

By default, the model comes with the following parameters:

gMLP(
    d_model=256,
    d_ffn=512,
    seq_len=256,
    num_layers=6,
)

Usage

The repository also contains gMLP models specifically for language modeling and image classification.

NLP

gMLPForLanguageModeling shares the same default parameters as gMLP, with num_tokens=10000 as an added parameter that represents the size of the token embedding table.

>>> from g_mlp import gMLPForLanguageModeling
>>> model = gMLPForLanguageModeling()
>>> tokens = torch.randint(0, 10000, (8, 256))
>>> model(tokens).shape
torch.Size([8, 256, 256])

Computer Vision

gMLPForImageClassification is a ViT-esque version of gMLP that includes a patch creating layer and a final classification head.

>>> from g_mlp import gMLPForImageClassification
>>> model = gMLPForImageClassification()
>>> images = torch.randn(8, 3, 256, 256)
>>> model(images).shape
torch.Size([8, 1000])

Summary

The authors of the paper present gMLP, an an attention-free all-MLP architecture based on spatial gating units. gMLP achieves parity with transformer models such as ViT and BERT on language and vision downstream tasks. The authors also show that gMLP scales with increased data and number of parameters, suggesting that self-attention is not a necessary component for designing performant models.

PyTorch implementation of Pay Attention to MLPs

Related tags

Overview

gMLP

Quickstart

Usage

NLP

Computer Vision

Summary

Resources

Owner

Jake Tae

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs

DP-CL(Continual Learning with Differential Privacy)

Patch SVDD for Image anomaly detection

Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

Training data extraction on GPT-2

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Tutorial page of the Climate Hack, the greatest hackathon ever

PyTorch implementation of SmoothGrad: removing noise by adding noise.

Unofficial PyTorch Implementation of "Augmenting Convolutional networks with attention-based aggregation"

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

Unofficial PyTorch Implementation of Multi-Singer

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

Practical and Real-world applications of ML based on the homework of Hung-yi Lee Machine Learning Course 2021

This is a simple face recognition mini project that was completed by a team of 3 members in 1 week's time

A PyTorch toolkit for 2D Human Pose Estimation.

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"