Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

  • For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
  • For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

  • implement the finetune process
  • reuse the model in modeling_pretrain.py
  • caculate the normalized pixels target
  • add the cls token in the encoder
  • ...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Note: the pretrain result is on the way ~

Owner
Zhiliang Peng
Zhiliang Peng
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

LP-Option-Hedging Description A Python program to analyze leveraged liquidity farming/mining and find the optimal option combination for hedging imper

Aureliano 18 Dec 19, 2022
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 08, 2023
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

Erland 127 Dec 23, 2022
DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

Amir Bar 283 Dec 27, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Official repo for SemanticGAN https://nv-tlabs.github.io/semanticGAN/

SemanticGAN This is the official code for: Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalizat

151 Dec 28, 2022
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Varun Nair 37 Dec 30, 2022
A setup script to generate ITK Python Wheels

ITK Python Package This project provides a setup.py script to build ITK Python binary packages and infrastructure to build ITK external module Python

Insight Software Consortium 59 Dec 14, 2022
Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image

Ibai Gorordo 24 Nov 14, 2022
Versatile Generative Language Model

Versatile Generative Language Model This is the implementation of the paper: Exploring Versatile Generative Language Model Via Parameter-Efficient Tra

Zhaojiang Lin 17 Dec 02, 2022
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 03, 2023
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN

Interpretable Control Exploration and Counterfactual Explanation (ICE) on StyleGAN Which Style Makes Me Attractive? Interpretable Control Discovery an

Bo Li 11 Dec 01, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案 团队:我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一 前言 19年开始,陆陆续续参加了一些比赛,拿到过一些top,比较懒一直都没分享过,这次比较幸运又拿了top1,打算分享下 分类的任务

shaochenjie 131 Dec 31, 2022
MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction The dataset contains 3 million attribute-value annotations across 1257 unique ca

Google Research Datasets 89 Jan 08, 2023
I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

Thinam Tamang 43 Dec 20, 2022
Six - a Python 2 and 3 compatibility library

Six is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the g

Benjamin Peterson 919 Dec 28, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Python with OpenCV - MediaPip Framework Hand Detection

Python HandDetection Python with OpenCV - MediaPip Framework Hand Detection Explore the docs » Contact Me About The Project It is a Computer vision pa

2 Jan 07, 2022