Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Last update: Dec 14, 2021

Related tags

Overview

mae-repo

PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https://github.com/lucidrains/vit-pytorch (for MAE architectures) and https://github.com/pengzhiliang/MAE-pytorch (for training loop).

prepare ImageNet1K datasets

To train MAE, one should prepare ImageNet_ILSVRC2012 and place ILSVRC2012_*.tar in the ${datasets_path}. To shorten the overhead of first run, one can manually untar the tarfile into train and val directories, as follow (refered to https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..

mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

modify configuration file

To separate code and config, we try to split configurations to yaml file, located in configs directory, such as imagenet1k-vit-base.yml. One can modify 'model' setting following MAE and ViT to configure model architecture parameters of ViT-base, large and huge.

One can modify 'optim' for optimizer settings. And modify 'training' and 'data' for training settings. Note that, modify 'training:batch_size' to fit the GPU memory of one GPU card. Total batch_size is equal to batch_size multiplied by number of GPU cards.

train

CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 mae_test.py
--datasets_path ${datasets_path}
--config imagenet1k-vit-base.yml
--doc mae-vit-base16-dec8-512

ToDo lists

add pretrain mode
add fine-tunning mode
support mixed precision training
support distributed training
verify the correctness of this re-implementation

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Related tags

Overview

mae-repo

prepare ImageNet1K datasets

modify configuration file

train

ToDo lists

Owner

Peng Qiao

Accelerate Neural Net Training by Progressively Freezing Layers

本步态识别系统主要基于GaitSet模型进行实现

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

OneShot Learning-based hotword detection.

PyGCL: A PyTorch Library for Graph Contrastive Learning

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

A simple AI that will give you si ple task and this is made with python

This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

A deep learning model for style-specific music generation.

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

YOLOv5🚀 reproduction by Guo Quanhao using PaddlePaddle

An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection