Implements MLP-Mixer: An all-MLP Architecture for Vision.

Overview

MLP-Mixer-CIFAR10

This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (Multi-layer Perceptron) architecture for computer vision tasks. Yannic Kilcher walks through the architecture in this video.

Experiments reported in this repository are on CIFAR-10.

What's included?

  • Distributed training with mixed-precision.
  • Visualization of the token-mixing MLP weights.
  • A TensorBoard callback to keep track of the learned linear projections of the image patches.
Screen.Recording.2021-05-25.at.5.49.20.PM.mov

Notebooks

Note: These notebooks are runnable on Colab. If you don't have access to a tensor-core GPU, please disable the mixed-precision block while running the code.

Results

MLP-Mixer achieves competitive results. The figure below summarizes top-1 accuracies on CIFAR-10 test set with respect to varying MLP blocks.


Notable hyperparameters are:

  • Image size: 72x72
  • Patch size: 9x9
  • Hidden dimension for patches: 64
  • Hidden dimension for patches: 128

The table below reports the parameter counts for the different MLP-Mixer variants:


ResNet20 (0.571969 Million) achieves 78.14% under the exact same training configuration. Refer to this notebook for more details.

Models

You can reproduce the results reported above. The model files are available here.

Acknowledgements

ML-GDE Program for providing GCP credits.

You might also like...
An All-MLP solution for Vision, from Google AI
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Implementation of
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Model search is a framework that implements AutoML algorithms for model architecture search at scale
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

A task-agnostic vision-language architecture as a step towards General Purpose Vision
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

MLP-Like Vision Permutator for Visual Recognition (PyTorch)
MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv) This is a Pytorch implementation of our paper. We present Vision

code for paper
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.
Comments
  • Could patches number != MLP token mixing dimension?

    Could patches number != MLP token mixing dimension?

    I try to change the model into B/16 MLP-Mixer. is this setting, the patch number ( sequence length) != MLP token mixing dimension. But the code will report an error when it implements "x = layers.Add()([x, token_mixing])" because the two operation numbers have different shapes. Take an example, B/16 Settings: image 3232, 2D hidden layer 768, PP= 16*16, token mixing mlp dimentsion= 384, channel mlp dimension = 3072. Thus patch number ( sequence length) = 4, table value shape= (4, 768) When the code runs x = layers.Add()([x, token_mixing]) in the token mixing layer. rx shape=[4, 768], token_mixing shape = [384, 768]

    It is strange why the MLP-Mixer paper could set different parameters "patch number ( sequence length) != MLP token mixing dimensio"

    opened by LouiValley 2
  • Why the accuracy drops after epoch 100/100 (accuracy drops from 91% to 71%)

    Why the accuracy drops after epoch 100/100 (accuracy drops from 91% to 71%)

    I trained the Network ( NUM_MIXER_LAYERS =4 )

    At epoch 100:

    Epoch 100/100

    1/44 [..............................] - ETA: 1s - loss: 0.2472 - accuracy: 0.9160 3/44 [=>............................] - ETA: 1s - loss: 0.2424 - accuracy: 0.9162 5/44 [==>...........................] - ETA: 1s - loss: 0.2431 - accuracy: 0.9155 7/44 [===>..........................] - ETA: 1s - loss: 0.2424 - accuracy: 0.9154 9/44 [=====>........................] - ETA: 1s - loss: 0.2419 - accuracy: 0.9155 11/44 [======>.......................] - ETA: 1s - loss: 0.2423 - accuracy: 0.9150 13/44 [=======>......................] - ETA: 1s - loss: 0.2426 - accuracy: 0.9145 15/44 [=========>....................] - ETA: 1s - loss: 0.2430 - accuracy: 0.9142 17/44 [==========>...................] - ETA: 1s - loss: 0.2433 - accuracy: 0.9140 19/44 [===========>..................] - ETA: 1s - loss: 0.2435 - accuracy: 0.9138 21/44 [=============>................] - ETA: 0s - loss: 0.2438 - accuracy: 0.9136 23/44 [==============>...............] - ETA: 0s - loss: 0.2439 - accuracy: 0.9135 25/44 [================>.............] - ETA: 0s - loss: 0.2440 - accuracy: 0.9134 27/44 [=================>............] - ETA: 0s - loss: 0.2440 - accuracy: 0.9133 29/44 [==================>...........] - ETA: 0s - loss: 0.2442 - accuracy: 0.9132 31/44 [====================>.........] - ETA: 0s - loss: 0.2445 - accuracy: 0.9130 33/44 [=====================>........] - ETA: 0s - loss: 0.2447 - accuracy: 0.9129 35/44 [======================>.......] - ETA: 0s - loss: 0.2450 - accuracy: 0.9127 37/44 [========================>.....] - ETA: 0s - loss: 0.2454 - accuracy: 0.9125 39/44 [=========================>....] - ETA: 0s - loss: 0.2459 - accuracy: 0.9123 41/44 [==========================>...] - ETA: 0s - loss: 0.2463 - accuracy: 0.9121 43/44 [============================>.] - ETA: 0s - loss: 0.2469 - accuracy: 0.9119 44/44 [==============================] - 2s 46ms/step - loss: 0.2474 - accuracy: 0.9117 - val_loss: 1.1145 - val_accuracy: 0.7226

    Then it still have an extra training, 1/313 [..............................] - ETA: 24:32 - loss: 0.5860 - accuracy: 0.8125 8/313 [..............................] - ETA: 2s - loss: 1.2071 - accuracy: 0.6953  ..... 313/313 [==============================] - ETA: 0s - loss: 1.0934 - accuracy: 0.7161 313/313 [==============================] - 12s 22ms/step - loss: 1.0934 - accuracy: 0.7161 Test accuracy: 71.61

    opened by LouiValley 1
  • Consider either turning off auto-sharding or switching the auto_shard_policy to DATA

    Consider either turning off auto-sharding or switching the auto_shard_policy to DATA

    Excuse me, when I try to run it on the serve, it tips:

    Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new tf.data.Options() object then setting options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA before applying the options object to the dataset via dataset.with_options(options). 2021-11-21 11:59:20.861052: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.

    BTW, my TensorFlow version is 2.4.0, how to fix this problem?

    opened by LouiValley 1
Releases(Models)
Owner
Sayak Paul
Trying to learn how machines learn.
Sayak Paul
Statistical and Algorithmic Investing Strategies for Everyone

Eiten - Algorithmic Investing Strategies for Everyone Eiten is an open source toolkit by Tradytics that implements various statistical and algorithmic

Tradytics 2.5k Jan 02, 2023
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) A considerably faster version based on CU

Xa9aX ツ 1.2k Dec 29, 2022
[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

IICNet - Invertible Image Conversion Net Official PyTorch Implementation for IICNet: A Generic Framework for Reversible Image Conversion (ICCV2021). D

felixcheng97 55 Dec 06, 2022
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ( 7 Jan 03, 2023

Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
Predicting future trajectories of people in cameras of novel scenarios and views.

Pedestrian Trajectory Prediction Predicting future trajectories of pedestrians in cameras of novel scenarios and views. This repository contains the c

8 Sep 03, 2022
CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper) (Accepted for oral presentation at ACM

Minha Kim 1 Nov 12, 2021
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

Michael Janner 269 Jan 05, 2023
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
Multi agent DDPG algorithm written in Python + Pytorch

Multi agent DDPG algorithm written in Python + Pytorch. It also includes a Jupyter notebook, Tennis.ipynb, as a showcase.

Rogier Wachters 2 Feb 26, 2022
links and status of cool gradio demos

awesome-demos This is a list of some wonderful demos & applications built with Gradio. Here's how to contribute yours! 🖊️ Natural language processing

Gradio 96 Dec 30, 2022
A PyTorch Implementation of Neural IMage Assessment

NIMA: Neural IMage Assessment This is a PyTorch implementation of the paper NIMA: Neural IMage Assessment (accepted at IEEE Transactions on Image Proc

yunxiaos 418 Dec 29, 2022
Implement of homography net by pytorch

HomographyNet Implement of homography net by pytorch Brief Introduction This project is based on the work Homography-Net: @article{detone2016deep, t

ronghao_CN 4 May 19, 2022
for taichi voxel-challange event

Taichi Voxel Challenge Figure: result of python3 example6.py. Please replace the image above (demo.jpg) with yours, so that other people can immediate

Liming Xu 20 Nov 26, 2022
Lazy, a tool for running things in idle time

Lazy, a tool for running things in idle time Mostly used to stop CUDA ML model training from making my desktop unusable. Simply monitors keyboard/mous

N Shepperd 46 Nov 06, 2022
Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Reducing Underflow in Mixed Precision Training by Gradient Scaling This project implements the gradient scaling method to improve the performance of m

Ruizhe Zhao 5 Apr 14, 2022
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

417 Dec 20, 2022
face property detection pytorch

This is the face property train code of project face-detection-project

i am x 2 Oct 18, 2021
Code for "LoRA: Low-Rank Adaptation of Large Language Models"

LoRA: Low-Rank Adaptation of Large Language Models This repo contains the implementation of LoRA in GPT-2 and steps to replicate the results in our re

Microsoft 394 Jan 08, 2023