Masked Autoencoders Are Scalable Vision Learners

A TensorFlow implementation of Masked Autoencoders Are Scalable Vision Learners [1]. Our implementation of the proposed method is available in mae-pretraining.ipynb notebook. It includes evaluation with linear probing as well. Furthermore, the notebook can be fully executed on Google Colab. Our main objective is to present the core idea of the proposed method in a minimal and readable manner. We have also prepared a blog for getting started with Masked Autoencoder easily.

Source: Masked Autoencoders Are Scalable Vision Learners

With just 100 epochs of pre-training and a fairly lightweight and asymmetric Autoencoder architecture we achieve 49.33%% accuracy with linear probing on the CIFAR-10 dataset. Our training logs and encoder weights are released in Weights and Logs. For comparison, we took the encoder architecture and trained it from scratch (refer to regular-classification.ipynb) in a fully supervised manner. This gave us ~76% test top-1 accuracy.

We note that with further hyperparameter tuning and more epochs of pre-training, we can achieve a better performance with linear-probing. Below we present some more results:

Config	Masking proportion	LP performance	Encoder weights & logs
Encoder & decoder layers: 3 & 1 Batch size: 256	0.6	44.25%	Link
Do	0.75	46.84%	Link
Encoder & decoder layers: 6 & 2 Batch size: 256	0.75	48.16%	Link
Encoder & decoder layers: 9 & 3 Batch size: 256 Weight deacy: 1e-5	0.75	49.33%	Link

^{LP denotes linear-probing. Config is mostly based on what we define in the hyperparameters
section of this notebook: mae-pretraining.ipynb.}

Notes

This project received the Google OSS Expert Prize (March 2022).

Acknowledgements

Xinlei Chen (one of the authors of the original paper)
Google Developers Experts Program and JarvisLabs for providing credits to perform extensive experimentation on A100 GPUs.

References

[1] Masked Autoencoders Are Scalable Vision Learners; He et al.; arXiv 2021; https://arxiv.org/abs/2111.06377.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mae-pretraining.ipynb		mae-pretraining.ipynb
regular-classification.ipynb		regular-classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

mae-pretraining.ipynb

mae-pretraining.ipynb

regular-classification.ipynb

regular-classification.ipynb

Repository files navigation

Masked Autoencoders Are Scalable Vision Learners

Notes

Acknowledgements

References

About

Releases 1

Contributors 2

Languages

License

ariG23498/mae-scalable-vision-learners

Folders and files

Latest commit

History

Repository files navigation

Masked Autoencoders Are Scalable Vision Learners

Notes

Acknowledgements

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages