MLP-Mixer-Pytorch

PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained parameters.

Usage

import torch
import numpy as np
from mlp_mixer import MlpMixer

pretrain_model='./pretrain_models/imagenet21k_Mixer-B_16.npz'

model = MlpMixer(num_classes=10, 
                 num_blocks=12, 
                 patch_size=16, 
                 hidden_dim=768, 
                 tokens_mlp_dim=384, 
                 channels_mlp_dim=3072, 
                 image_size=224
                 )

# load official ImageNet pre-trained model:
model.load_from(np.load(pretrain_model))
print ('Finish loading the pre-trained model!')

num_param = sum(p.numel() for p in model.parameters()) / 1e6
print ('Total params.: %f M'%num_param)

pred = model(img)

Fine-tuning

Download the official pre-trained models at https://console.cloud.google.com/storage/mixer_models/.

Hypyer-parameters setting for better fine-tuning:

optim = torch.optim.SGD(param_list, 
                        lr=5e-4, 
                        weight_decay=1e-7,
                        momentum=0.9, 
                        nesterov=True
                        )
lr_schdlr = WarmupCosineLrScheduler(optim, 
                                    n_iters_all, 
                                    warmup_iter=0
                                    )

Using the pre-trained model to fine-tune MLP-Mixer can obtain remarkable improvements (e.g., +10% accuracy on a small dataset).

Note that we can also change the patch_size (e.g., patch_size=8) for inputs with different resolutions, but smaller patch_size may not always bring performance improvements.

Citation

@misc{tolstikhin2021mlpmixer,
      title={MLP-Mixer: An all-MLP Architecture for Vision}, 
      author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
      year={2021},
      eprint={2105.01601},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

The implementation is based on the original paper and the official Tensorflow repo: https://github.com/google-research/vision_transformer.
It also refers to the re-implementation repo: https://github.com/d-li14/mlp-mixer.pytorch.

Pytorch implementation of MLP-Mixer with loading pre-trained models.

Related tags

Overview

MLP-Mixer-Pytorch

Usage

Fine-tuning

Citation

Acknowledgement

Owner

Qiushi Yang

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Multiple-Object Tracking with Transformer

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

基于Paddlepaddle复现yolov5，支持PaddleDetection接口

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

NP DRAW paper released code

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

QuadTree Attention for Vision Transformers (ICLR2022)

Neural Network Libraries

In the AI for TSP competition we try to solve optimization problems using machine learning.

GANTheftAuto is a fork of the Nvidia's GameGAN

Few-shot NLP benchmark for unified, rigorous eval