An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

Overview

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

This is a coarse version for MAE, only make the pretrain model, the finetune and linear is comming soon.

1. Introduction

This repo is the MAE-vit model which impelement with pytorch, no reference any reference code so this is a non-official version. Because of the limitation of time and machine, I only trained the vit-tiny model for encoder. mae

2. Enveriments

  • python 3.7+
  • pytorch 1.7.1
  • pillow
  • timm
  • opencv-python

3. Model Config

Pretrain Config

  • BaseConfig
    img_size = 224,
    patch_size = 16,
  • Encoder The encoder if follow the Vit-tiny model config
    encoder_dim = 192,
    encoder_depth = 12,
    encoder_heads = 3,
  • Decoder The decoder is followed the kaiming paper config.
    decoder_dim = 512,
    decoder_depth = 8,
    decoder_heads = 16, 
  • Mask
    1. We use the shuffle patch after Sin-Cos position embeeding for encoder.
    2. Mask the shuffle patch, keep the mask index.
    3. Unshuffle the mask patch and combine with the encoder embeeding before the position embeeding for decoder.
    4. Restruction decoder embeeidng by convtranspose.
    5. Build the mask map with mask index for cal the loss(only consider the mask patch).

Finetune Config

Wait for the results

TODO:

  • Finetune Trainig
  • Linear Training

4. Results

decoder Restruction the imagenet validation image from pretrain model, compare with the kaiming results, restruction quality is less than he. May be the encoder model is too small TT.

The Mae-Vit-tiny pretrain models is here, you can download to test the restruction result. Put the ckpt in weights folder.

5. Training & Inference

  • dataset prepare

    /data/home/imagenet/xxx.jpeg, 0
    /data/home/imagenet/xxx.jpeg, 1
    ...
    /data/home/imagenet/xxx.jpeg, 999
    
  • Training

    1. Pretrain

      #!/bin/bash
      OMP_NUM_THREADS=1
      MKL_NUM_THREADS=1
      export OMP_NUM_THREADS
      export MKL_NUM_THREADS
      cd MAE-Pytorch;
      CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train_mae.py \
      --batch_size 256 \
      --num_workers 32 \
      --lr 1.5e-4 \
      --optimizer_name "adamw" \
      --cosine 1 \
      --max_epochs 300 \
      --warmup_epochs 40 \
      --num-classes 1000 \
      --crop_size 224 \
      --patch_size 16 \
      --color_prob 0.0 \
      --calculate_val 0 \
      --weight_decay 5e-2 \
      --lars 0 \
      --mixup 0.0 \
      --smoothing 0.0 \
      --train_file $train_file \
      --val_file $val_file \
      --checkpoints-path $ckpt_folder \
      --log-dir $log_folder
    2. Finetune TODO:

      • training
    3. Linear TODO:

      • training
  • Inference

    1. pretrian
    python mae_test.py --test_image xxx.jpg --ckpt weights.pth
    1. classification TODO:
      • training

6. TODO

  • VIT-BASE model training.
  • SwinTransformers for MAE.
  • Finetune & Linear training.

Finetune is trainig, the weights may be comming soon.

Owner
FlyEgle
JOYY AI GROUP - Machine Learning Engineer(Computer Vision)
FlyEgle
A Python parser that takes the content of a text file and then reads it into variables.

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 -

Kelvin 0 Jul 26, 2021
PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

PixelPyramids: Exact Inference Models from Lossless Image Pyramids This repository contains the PyTorch implementation of the paper PixelPyramids: Exa

Visual Inference Lab @TU Darmstadt 8 Dec 11, 2022
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth

Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth This codebase implements the loss function described in: Insta

209 Dec 07, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÜWA is a unified multimodal

Microsoft 2.6k Jan 03, 2023
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 253 Dec 21, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

Facebook Research 4.2k Jan 03, 2023
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
Animate molecular orbital transitions using Psi4 and Blender

Molecular Orbital Transitions (MOT) Animate molecular orbital transitions using Psi4 and Blender Author: Maximilian Paradiz Dominguez, University of A

3 Feb 01, 2022
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral).

334 Dec 31, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks Stable Neural ODE with Lyapunov-Stable Equilibrium

Kang Qiyu 8 Dec 12, 2022
Most popular metrics used to evaluate object detection algorithms.

Most popular metrics used to evaluate object detection algorithms.

Rafael Padilla 4.4k Dec 25, 2022
NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language NaturalProofs: Mathematical Theorem Proving in Natural Language Sean Welleck, Jiacheng

Sean Welleck 83 Jan 05, 2023
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

Ryo Yonetani 2 Jan 16, 2022
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022