Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Last update: Jan 04, 2023

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

implement the finetune process
reuse the model in modeling_pretrain.py
caculate the normalized pixels target
add the cls token in the encoder
...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Note: the pretrain result is on the way ~

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Owner

Zhiliang Peng

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

Causal Imitative Model for Autonomous Driving

Group-Free 3D Object Detection via Transformers

Bringing sanity to world of messed-up data

Multi Camera Calibration

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

UFT - Universal File Transfer With Python

This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Prediction Graph Neural Network Model for Bike Sharing Systems".

Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Official implementation of the ICLR 2021 paper

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

A PyTorch implementation of unsupervised SimCSE

MG-GCN: Scalable Multi-GPU GCN Training Framework

A facial recognition doorbell system using a Raspberry Pi