Pytorch implementation of Masked Auto-Encoder

Last update: Dec 13, 2022

Related tags

Deep Learning MAE-code

Overview

Masked Auto-Encoder (MAE)

Pytorch implementation of Masked Auto-Encoder:

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021.

Usage

Clone to the local.

> git clone https://github.com/liujiyuan13/MAE-code.git MAE-code

Install required packages.

> cd MAE-code
> pip install requirements.txt

Prepare datasets.

For Cifar10, Cifar100 and STL, skip this step for it will be done automatically;
For ImageNet1K, download and unzip the train(val) set into ./data/ImageNet1K/train(val).

Set parameters.

All parameters are kept in default_args() function of main_mae(eval).py file.

Run the code.

> python main_mae.py	# train MAE encoder
> python main_eval.py	# evaluate MAE encoder

Visualize the ouput.

> tensorboard --logdir=./log --port 8888

Detail

Project structure

...
+ ckpt				# checkpoint
+ data 				# data folder
+ img 				# store images for README.md
+ log 				# log files
.gitignore 			
lars.py 			# LARS optimizer
main_eval.py 			# main file for evaluation
main_mae.py  			# main file for MAE training
model.py 			# model definitions of MAE and EvalNet
README.md 
util.py 			# helper functions
vit.py 				# definition of vision transformer

Encoder setting

In the paper, ViT-Base, ViT-Large and ViT-Huge are used. You can switch between them by simply changing the parameters in default_args(). Details can be found here and are listed in following table.

Name	Layer Num.	Hidden Size	MLP Size	Head Num.
Arg	vit_depth	vit_dim	vit_mlp_dim	vit_heads
ViT-B	12	768	3072	12
ViT-L	24	1024	4096	16
ViT-H	32	1280	5120	16

Evaluation setting

I implement four network training strategies concerned in the paper, including

pre-training is used to train MAE encoder and done in main_mae.py.
linear probing is used to evaluate MAE encoder. During training, MAE encoder is fixed.
- args.n_partial = 0
partial fine-tuning is used to evaluate MAE encoder. During training, MAE encoder is partially fixed.
- args.n_partial = 0.5 --> fine-tuning MLP sub-block with the transformer fixed
- 1<=args.n_partial<=args.vit_depth-1 --> fine-tuning MLP sub-block and last layers of transformer
end-to-end fine-tuning is used to evaluate MAE encoder. During training, MAE encoder is fully trainable.
- args.n_partial = args.vit_depth

Note that the last three strategies are done in main_eval.py where parameter args.n_partial is located.

At the same time, I follow the parameter settings in the paper appendix. Note that partial fine-tuning and end-to-end fine-tuning use the same setting. Nevertheless, I replace RandAug(9, 0.5) with RandomResizedCrop and leave mixup, cutmix and drop path techniques in further implementation.

Result

The experiment reproduce will takes a long time and I am unfortunately busy these days. If you get some results and are willing to contribute, please reach me via email. Thanks!

By the way, I have run the code from start to end. It works! So don't worry about the implementation errors. If you find any, please raise issues or email me.

Licence

This repository is under GPL V3.

About

Thanks project vit-pytorch, pytorch-lars and DeepLearningExamples for their codes contribute to this repository a lot!

Homepage: https://liujiyuan13.github.io

Email: [email protected]

Pytorch implementation of Masked Auto-Encoder

Related tags

Overview

Masked Auto-Encoder (MAE)

Usage

Detail

Project structure

Encoder setting

Evaluation setting

Result

Licence

About

Owner

Jiyuan

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.

Swapping face using Face Mesh with TensorFlow Lite

Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

A Flow-based Generative Network for Speech Synthesis

ISNAS-DIP: Image Specific Neural Architecture Search for Deep Image Prior [CVPR 2022]

PyTorch 1.0 inference in C++ on Windows10 platforms

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Platform-agnostic AI Framework 🔥

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems.

Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

To Design and Implement Logistic Regression to Classify Between Benign and Malignant Cancer Types