A simple, unofficial implementation of MAE using pytorch-lightning

Last update: Dec 03, 2022

Related tags

Deep Learning mae-pytorch

Overview

Masked Autoencoders in PyTorch

A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Currently implements training on CUB and StanfordCars, but is easily extensible to any other image dataset.

Setup

.env">

# Clone the repository
git clone https://github.com/catalys1/mae-pytorch.git
cd mae-pytorch

# Install required libraries (inside a virtual environment preferably)
pip install -r requirements.txt

# Set up .env for path to data
echo "DATADIR=/path/to/data" > .env

Usage

MAE training

Training options are provided through configuration files, handled by LightningCLI. See configs/ for examples.

Train an MAE model on the CUB dataset:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml

Using multiple GPUs:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml --config=configs/multigpu.yaml

Fine-tuning

Not yet implemented.

Implementation

The default model uses ViT-Base for the encoder, and a small ViT (depth=4, width=192) for the decoder. This is smaller than the model used in the paper.

Dependencies

Configuration and training is handled completely by pytorch-lightning.
The MAE model uses the VisionTransformer from timm.
Interface to FGVC datasets through fgvcdata.
Configurable environment variables through python-dotenv.

Results

Image reconstructions of CUB validation set images after training with the following command:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml --config=configs/multigpu.yaml

A simple, unofficial implementation of MAE using pytorch-lightning

Related tags

Overview

Masked Autoencoders in PyTorch

Setup

Usage

MAE training

Fine-tuning

Implementation

Dependencies

Results

Owner

Connor Anderson

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Python implementation of Lightning-rod Agent, the Stack4Things board-side probe

Unsupervised Image Generation with Infinite Generative Adversarial Networks

Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Self-training for Few-shot Transfer Across Extreme Task Differences

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

CVPRW 2021: How to calibrate your event camera

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Official TensorFlow code for the forthcoming paper

PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

Jingju baseline - A baseline model of our project of Beijing opera script generation

The AWS Certified SysOps Administrator

A Light CNN for Deep Face Representation with Noisy Labels

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

Save-restricted-v-3 - Save restricted content Bot For telegram

A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

Simple Baselines for Human Pose Estimation and Tracking