Pytorch implementation of MLP-Mixer with loading pre-trained models.

Overview

MLP-Mixer-Pytorch

PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained parameters.

Usage

import torch
import numpy as np
from mlp_mixer import MlpMixer

pretrain_model='./pretrain_models/imagenet21k_Mixer-B_16.npz'

model = MlpMixer(num_classes=10, 
                 num_blocks=12, 
                 patch_size=16, 
                 hidden_dim=768, 
                 tokens_mlp_dim=384, 
                 channels_mlp_dim=3072, 
                 image_size=224
                 )

# load official ImageNet pre-trained model:
model.load_from(np.load(pretrain_model))
print ('Finish loading the pre-trained model!')

num_param = sum(p.numel() for p in model.parameters()) / 1e6
print ('Total params.: %f M'%num_param)

pred = model(img)

Fine-tuning

Download the official pre-trained models at https://console.cloud.google.com/storage/mixer_models/.

Hypyer-parameters setting for better fine-tuning:

optim = torch.optim.SGD(param_list, 
                        lr=5e-4, 
                        weight_decay=1e-7,
                        momentum=0.9, 
                        nesterov=True
                        )
lr_schdlr = WarmupCosineLrScheduler(optim, 
                                    n_iters_all, 
                                    warmup_iter=0
                                    )

Using the pre-trained model to fine-tune MLP-Mixer can obtain remarkable improvements (e.g., +10% accuracy on a small dataset).

Note that we can also change the patch_size (e.g., patch_size=8) for inputs with different resolutions, but smaller patch_size may not always bring performance improvements.

Citation

@misc{tolstikhin2021mlpmixer,
      title={MLP-Mixer: An all-MLP Architecture for Vision}, 
      author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
      year={2021},
      eprint={2105.01601},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

  1. The implementation is based on the original paper and the official Tensorflow repo: https://github.com/google-research/vision_transformer.
  2. It also refers to the re-implementation repo: https://github.com/d-li14/mlp-mixer.pytorch.
Owner
Qiushi Yang
Student in CityU, Hong Kong. Interested in computer vision and machine learning.
Qiushi Yang
A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

ICT.MIRACLE lab 75 Dec 26, 2022
Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion Preface This directory provides an implementation of the algori

Jean-Samuel Leboeuf 0 Nov 03, 2021
Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

Peize Sun 537 Jan 04, 2023
DRIFT is a tool for Diachronic Analysis of Scientific Literature.

About DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable utilities for two modes:

Rajaswa Patil 108 Dec 12, 2022
Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

S-multi-SNE Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification A repository containing the code to reproduce the findings

Theodoulos Rodosthenous 3 Apr 15, 2022
A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

GAM ⠀⠀ A PyTorch implementation of Graph Classification Using Structural Attention (KDD 2018). Abstract Graph classification is a problem with practic

Benedek Rozemberczki 259 Dec 05, 2022
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
基于Paddlepaddle复现yolov5,支持PaddleDetection接口

PaddleDetection yolov5 https://github.com/Sharpiless/PaddleDetection-Yolov5 简介 PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。 PaddleD

36 Jan 07, 2023
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 04, 2023
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

peng gao 42 Nov 26, 2022
Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Introduction This repository is for Deep Learning agent of Starcraft2. It is very similar to AlphaStar of DeepMind except size of network. I only test

Dohyeong Kim 136 Jan 04, 2023
Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

Brian Alejandro 1 Feb 13, 2022
NP DRAW paper released code

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation This repo contains the official implementation for the NP-DRAW paper.

ZENG Xiaohui 22 Mar 13, 2022
A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

LABES This is the code for EMNLP 2020 paper "A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised L

17 Sep 28, 2022
QuadTree Attention for Vision Transformers (ICLR2022)

This repository contains codes for quadtree attention. This repo contains codes for feature matching, image classficiation, object detection and seman

tangshitao 222 Dec 28, 2022
Neural Network Libraries

Neural Network Libraries Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. W

Sony 2.6k Dec 30, 2022
In the AI for TSP competition we try to solve optimization problems using machine learning.

AI for TSP Competition Goal In the AI for TSP competition we try to solve optimization problems using machine learning. The competition will be hosted

Paulo da Costa 11 Nov 27, 2022
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
Few-shot NLP benchmark for unified, rigorous eval

FLEX FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables: First-class NLP support Support for meta-training

AI2 85 Dec 03, 2022