MLP-Mixer-Pytorch

PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained parameters.

Usage

import torch
import numpy as np
from mlp_mixer import MlpMixer

pretrain_model='./pretrain_models/imagenet21k_Mixer-B_16.npz'

model = MlpMixer(num_classes=10, 
                 num_blocks=12, 
                 patch_size=16, 
                 hidden_dim=768, 
                 tokens_mlp_dim=384, 
                 channels_mlp_dim=3072, 
                 image_size=224
                 )

# load official ImageNet pre-trained model:
model.load_from(np.load(pretrain_model))
print ('Finish loading the pre-trained model!')

num_param = sum(p.numel() for p in model.parameters()) / 1e6
print ('Total params.: %f M'%num_param)

pred = model(img)

Fine-tuning

Download the official pre-trained models at https://console.cloud.google.com/storage/mixer_models/.

Hypyer-parameters setting for better fine-tuning:

optim = torch.optim.SGD(param_list, 
                        lr=5e-4, 
                        weight_decay=1e-7,
                        momentum=0.9, 
                        nesterov=True
                        )
lr_schdlr = WarmupCosineLrScheduler(optim, 
                                    n_iters_all, 
                                    warmup_iter=0
                                    )

Using the pre-trained model to fine-tune MLP-Mixer can obtain remarkable improvements (e.g., +10% accuracy on a small dataset).

Note that we can also change the patch_size (e.g., patch_size=8) for inputs with different resolutions, but smaller patch_size may not always bring performance improvements.

Citation

@misc{tolstikhin2021mlpmixer,
      title={MLP-Mixer: An all-MLP Architecture for Vision}, 
      author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
      year={2021},
      eprint={2105.01601},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

The implementation is based on the original paper and the official Tensorflow repo: https://github.com/google-research/vision_transformer.
It also refers to the re-implementation repo: https://github.com/d-li14/mlp-mixer.pytorch.

Pytorch implementation of MLP-Mixer with loading pre-trained models.

Related tags

Overview

MLP-Mixer-Pytorch

Usage

Fine-tuning

Citation

Acknowledgement

Owner

Qiushi Yang

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

Fully-automated scripts for collecting AI-related papers

Stacked Generative Adversarial Networks

This is an easy python software which allows to sort images with faces by gender and after by age.

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

[SIGMETRICS 2022] One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Training and Evaluation Code for Neural Volumes

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS).

M3DSSD: Monocular 3D Single Stage Object Detector

CL-Gym: Full-Featured PyTorch Library for Continual Learning

audioLIME: Listenable Explanations Using Source Separation

PyTorch implementation of Deformable Convolution

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Wileless-PDGNet Implementation

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

PINN Burgers - 1D Burgers equation simulated by PINN