Pytorch implementation of Masked Auto-Encoder

Related tags

Deep LearningMAE-code
Overview

Masked Auto-Encoder (MAE)

Pytorch implementation of Masked Auto-Encoder:

Usage

  1. Clone to the local.
> git clone https://github.com/liujiyuan13/MAE-code.git MAE-code
  1. Install required packages.
> cd MAE-code
> pip install requirements.txt
  1. Prepare datasets.
  • For Cifar10, Cifar100 and STL, skip this step for it will be done automatically;
  • For ImageNet1K, download and unzip the train(val) set into ./data/ImageNet1K/train(val).
  1. Set parameters.
  • All parameters are kept in default_args() function of main_mae(eval).py file.
  1. Run the code.
> python main_mae.py	# train MAE encoder
> python main_eval.py	# evaluate MAE encoder
  1. Visualize the ouput.
> tensorboard --logdir=./log --port 8888

Detail

Project structure

...
+ ckpt				# checkpoint
+ data 				# data folder
+ img 				# store images for README.md
+ log 				# log files
.gitignore 			
lars.py 			# LARS optimizer
main_eval.py 			# main file for evaluation
main_mae.py  			# main file for MAE training
model.py 			# model definitions of MAE and EvalNet
README.md 
util.py 			# helper functions
vit.py 				# definition of vision transformer

Encoder setting

In the paper, ViT-Base, ViT-Large and ViT-Huge are used. You can switch between them by simply changing the parameters in default_args(). Details can be found here and are listed in following table.

Name Layer Num. Hidden Size MLP Size Head Num.
Arg vit_depth vit_dim vit_mlp_dim vit_heads
ViT-B 12 768 3072 12
ViT-L 24 1024 4096 16
ViT-H 32 1280 5120 16

Evaluation setting

I implement four network training strategies concerned in the paper, including

  • pre-training is used to train MAE encoder and done in main_mae.py.
  • linear probing is used to evaluate MAE encoder. During training, MAE encoder is fixed.
    • args.n_partial = 0
  • partial fine-tuning is used to evaluate MAE encoder. During training, MAE encoder is partially fixed.
    • args.n_partial = 0.5 --> fine-tuning MLP sub-block with the transformer fixed
    • 1<=args.n_partial<=args.vit_depth-1 --> fine-tuning MLP sub-block and last layers of transformer
  • end-to-end fine-tuning is used to evaluate MAE encoder. During training, MAE encoder is fully trainable.
    • args.n_partial = args.vit_depth

Note that the last three strategies are done in main_eval.py where parameter args.n_partial is located.

At the same time, I follow the parameter settings in the paper appendix. Note that partial fine-tuning and end-to-end fine-tuning use the same setting. Nevertheless, I replace RandAug(9, 0.5) with RandomResizedCrop and leave mixup, cutmix and drop path techniques in further implementation.

Result

The experiment reproduce will takes a long time and I am unfortunately busy these days. If you get some results and are willing to contribute, please reach me via email. Thanks!

By the way, I have run the code from start to end. It works! So don't worry about the implementation errors. If you find any, please raise issues or email me.

Licence

This repository is under GPL V3.

About

Thanks project vit-pytorch, pytorch-lars and DeepLearningExamples for their codes contribute to this repository a lot!

Homepage: https://liujiyuan13.github.io

Email: [email protected]

Owner
Jiyuan
Jiyuan
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485

python-pylontech Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485 What is this lib ? This lib is meant to talk to P

Frank 26 Dec 28, 2022
Copy Paste positive polyp using poisson image blending for medical image segmentation

Copy Paste positive polyp using poisson image blending for medical image segmentation According poisson image blending I've completely used it for bio

Phạm Vũ Hùng 2 Oct 19, 2021
Image-to-image regression with uncertainty quantification in PyTorch

Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.

Anastasios Angelopoulos 25 Dec 26, 2022
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
Half Instance Normalization Network for Image Restoration

HINet Half Instance Normalization Network for Image Restoration, based on https://github.com/megvii-model/HINet. Dependencies NumPy PyTorch, preferabl

Holy Wu 4 Jun 06, 2022
LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant Self-At

OxCSML (Oxford Computational Statistics and Machine Learning) 50 Dec 28, 2022
Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019) Introduction Official implementation of Dynamic Multi-scale Filters for Semant

23 Oct 21, 2022
Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR) 13 Dec 10, 2022
Bayesian dessert for Lasagne

Gelato Bayesian dessert for Lasagne Recent results in Bayesian statistics for constructing robust neural networks have proved that it is one of the be

Maxim Kochurov 84 May 11, 2020
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Hansheng Jiang 6 Nov 18, 2022
Log4j JNDI inj. vuln scanner

Log-4-JAM - Log 4 Just Another Mess Log4j JNDI inj. vuln scanner Requirements pip3 install requests_toolbelt Usage # make sure target list has http/ht

Ashish Kunwar 66 Nov 09, 2022
CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels

CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels Accurate pressure drop estimat

Alejandro Montanez 0 Jan 21, 2022
An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

0 May 06, 2022
Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

NVIDIA Research Projects 1.4k Dec 31, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 393 Dec 13, 2022
NHS AI Lab Skunkworks project: Long Stayer Risk Stratification

NHS AI Lab Skunkworks project: Long Stayer Risk Stratification A pilot project for the NHS AI Lab Skunkworks team, Long Stayer Risk Stratification use

NHSX 21 Nov 14, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022