PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Last update: Jan 04, 2023

Related tags

Overview

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}

The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.
This repo is a modification on the DeiT repo. Installation and preparation follow that repo.
This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

Visualization demo
Pre-trained checkpoints + fine-tuning code
Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

	ViT-Base	ViT-Large	ViT-Huge
pre-trained checkpoint	download	download	download
md5	`8cad7c`	`b8b06e`	`9bdbb0`

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

	ViT-B	ViT-L	ViT-H	ViT-H₄₄₈	prev best
ImageNet-1K (no external data)	83.6	85.9	86.9	87.8	87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate)	51.7	41.8	33.8	36.8	42.5
ImageNet-Adversarial	35.9	57.1	68.2	76.7	35.8
ImageNet-Rendition	48.3	59.9	64.4	66.5	48.7
ImageNet-Sketch	34.5	45.3	49.6	50.9	36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017	70.5	75.7	79.3	83.4	75.4
iNaturalists 2018	75.4	80.1	83.0	86.8	81.2
iNaturalists 2019	80.5	83.4	85.7	88.3	84.1
Places205	63.9	65.8	65.9	66.8	66.0
Places365	57.9	59.4	59.8	60.3	58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Masked Autoencoders: A PyTorch Implementation

Catalog

Visualization demo

Fine-tuning with pre-trained checkpoints

Pre-training

License

Owner

Meta Research

Adaptive Denoising Training (ADT) for Recommendation.

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

【steal piano】GitHub偷情分析工具！

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

Official PyTorch implementation of Spatial Dependency Networks.

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

Using OpenAI's CLIP to upscale and enhance images

A curated list of awesome neural radiance fields papers

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

A python library for implementing a recommender system

Six - a Python 2 and 3 compatibility library

Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

SBINN: Systems-biology informed neural network

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

MAterial del programa Misión TIC 2022

Hepsiburada - Hepsiburada Urun Bilgisi Cekme

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

ICLR21 Tent: Fully Test-Time Adaptation by Entropy Minimization