An All-MLP solution for Vision, from Google AI

Last update: Jan 06, 2023

Related tags

Overview

MLP Mixer - Pytorch

An All-MLP solution for Vision, from Google AI, in Pytorch.

No convolutions nor attention needed!

Yannic Kilcher video

Install

$ pip install mlp-mixer-pytorch

Usage

import torch
from mlp_mixer_pytorch import MLPMixer

model = MLPMixer(
    image_size = 256,
    patch_size = 16,
    dim = 512,
    depth = 12,
    num_classes = 1000
)

img = torch.randn(1, 3, 256, 256)
pred = model(img) # (1, 1000)

Citations

@misc{tolstikhin2021mlpmixer,
    title   = {MLP-Mixer: An all-MLP Architecture for Vision},
    author  = {Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
    year    = {2021},
    eprint  = {2105.01601},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

33 Nov 27, 2022

Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

24 Mar 23, 2022

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

208 Dec 25, 2022

Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

244 Dec 27, 2022

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

7.7k Jan 5, 2023

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 🤖 PaddlePaddle Visual Transformers (PaddleViT or

1k Dec 28, 2022

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Keras_cv_attention_models Keras_cv_attention_models Usage Basic Usage Layers Model surgery AotNet ResNetD ResNeXt ResNetQ BotNet VOLO ResNeSt HaloNet

319 Dec 28, 2022

Unofficial Implementation of MLP-Mixer, Image Classification Model

MLP-Mixer Unoffical Implementation of MLP-Mixer, easy to use with terminal. Train and test easly. https://arxiv.org/abs/2105.01601 MLP-Mixer is an arc

6 Dec 5, 2022

MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

MLP-Numpy A simple modular implementation of Multi Layer Perceptron in pure Numpy. I used the Iris dataset from scikit-learn library for the experimen

1 Jan 1, 2022

Comments

expansion_factor on tokens is actually a bottleneck in original codebase

Thanks for your implementation. In comparing your codebase to the author's implementation, I discovered that while you have a single expansion factor in your configuration, the authors have separate values - one for tokens and one for channels.

Specifically, their channels expansion factor is 4, but their tokens expansion factor is 0.5. (The hidden_dim is the base projection size). Note that they actually use a feature count, but I'm translating to the mechanism you use in this codebase.

Thus, when executing the MixerBlock, the tokens "expansion" is actually a bottleneck.

The parameters can be verified as well in Table 1 ("Specifications of Mixer Architectures") at the top of page 4 in version 4 (the current version as of Feb 14, 2022) of their paper.

I'm not suggesting that anything necessarily needs to change in your implementation. However, if you wanted to align your codebase to be able to fully replicate the author's work, you may consider allowing for two separate parameters - token_expansion_factor and channels_expansion_factor.

Thank you again for this work, and for all your contributions generally. You are a an incredible asset to the community.

opened by chazzmoney 1
Dall-E implementation

Amazing work! How difficult is it to implement mlp into Dall-E? As the whole idea around Dall-E evolves around attention layers and transformers, I wonder if this simpler model would enable smaller, equally capable models...

opened by robvanvolt 1

An All-MLP solution for Vision, from Google AI

Related tags

Overview

MLP Mixer - Pytorch

Install

Usage

Citations

You might also like...

PyTorch implementation of MLP-Mixer

Unofficial Implementation of MLP-Mixer in TensorFlow

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Implementation of "A MLP-like Architecture for Dense Prediction"

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Unofficial Implementation of MLP-Mixer, Image Classification Model

MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

Comments

expansion_factor on tokens is actually a bottleneck in original codebase

Dall-E implementation

Releases(0.1.1)

0.1.1(Feb 17, 2022)

0.1.0(Feb 17, 2022)

0.0.10(Jun 24, 2021)

0.0.9(Jun 24, 2021)

0.0.8(Jun 24, 2021)

0.0.7(May 30, 2021)

0.0.6(May 7, 2021)

0.0.5(May 7, 2021)

0.0.4(May 7, 2021)

0.0.3(May 5, 2021)

0.0.2b(May 5, 2021)

0.0.1(May 5, 2021)

Owner

Phil Wang

Focal and Global Knowledge Distillation for Detectors

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Syntax-Aware Action Targeting for Video Captioning

OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

Film review classification

NeurIPS 2021, self-supervised 6D pose on category level

Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Residual Dense Net De-Interlace Filter (RDNDIF)

YoloV3 Implemented in Tensorflow 2.0

这是一个mobilenet-yolov4-lite的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

The dynamics of representation learning in shallow, non-linear autoencoders

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"