Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Related tags

Deep Learningmae-repo
Overview

mae-repo

PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https://github.com/lucidrains/vit-pytorch (for MAE architectures) and https://github.com/pengzhiliang/MAE-pytorch (for training loop).

prepare ImageNet1K datasets

To train MAE, one should prepare ImageNet_ILSVRC2012 and place ILSVRC2012_*.tar in the ${datasets_path}. To shorten the overhead of first run, one can manually untar the tarfile into train and val directories, as follow (refered to https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

modify configuration file

To separate code and config, we try to split configurations to yaml file, located in configs directory, such as imagenet1k-vit-base.yml. One can modify 'model' setting following MAE and ViT to configure model architecture parameters of ViT-base, large and huge.

One can modify 'optim' for optimizer settings. And modify 'training' and 'data' for training settings. Note that, modify 'training:batch_size' to fit the GPU memory of one GPU card. Total batch_size is equal to batch_size multiplied by number of GPU cards.

train

CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 mae_test.py
--datasets_path ${datasets_path}
--config imagenet1k-vit-base.yml
--doc mae-vit-base16-dec8-512

ToDo lists

  • add pretrain mode
  • add fine-tunning mode
  • support mixed precision training
  • support distributed training
  • verify the correctness of this re-implementation
Owner
Peng Qiao
Peng Qiao
ML powered analytics engine for outlier detection and root cause analysis.

Website • Docs • Blog • LinkedIn • Community Slack ML powered analytics engine for outlier detection and root cause analysis ✨ What is Chaos Genius? C

Chaos Genius 523 Jan 04, 2023
GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️ This repo contains a PyTorch implementation of the original GAT paper ( 🔗 Veličković et

Aleksa Gordić 1.9k Jan 09, 2023
Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker

Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker This is a full project of image segmentation using the model built with

Htin Aung Lu 1 Jan 04, 2022
Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

ASUTOSH GHANTO 1 Dec 16, 2021
KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

CAiRE 42 Nov 10, 2022
Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

VQGAN-CLIP-GENERATOR Overview This is a package (with available notebook) for running VQGAN+CLIP locally, with a focus on ease of use, good documentat

Ryan Hamilton 98 Dec 30, 2022
Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

Muhammad Zain Ul Haque 1 Dec 31, 2021
Benchmark for Answering Existential First Order Queries with Single Free Variable

EFO-1-QA Benchmark for First Order Query Estimation on Knowledge Graphs This repository contains an entire pipeline for the EFO-1-QA benchmark. EFO-1

HKUST-KnowComp 14 Oct 24, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 284 Dec 23, 2022
Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

xuanyuan14 5 Mar 05, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
yolov5目标检测模型的知识蒸馏(基于响应的蒸馏)

代码地址: https://github.com/Sharpiless/yolov5-knowledge-distillation 教师模型: python train.py --weights weights/yolov5m.pt \ --cfg models/yolov5m.ya

52 Dec 04, 2022
A best practice for tensorflow project template architecture.

A best practice for tensorflow project template architecture.

Mahmoud Gamal Salem 3.6k Dec 22, 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 08, 2023
The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

DER.ClassIL.Pytorch This repo is the official implementation of DER: Dynamically Expandable Representation for Class Incremental Learning (CVPR 2021)

rhyssiyan 108 Jan 01, 2023
A High-Quality Real Time Upscaler for Anime Video

Anime4K Anime4K is a set of open-source, high-quality real-time anime upscaling/denoising algorithms that can be implemented in any programming langua

15.7k Jan 06, 2023
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022
Meshed-Memory Transformer for Image Captioning. CVPR 2020

M²: Meshed-Memory Transformer This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020). Pl

AImageLab 422 Dec 28, 2022
Deep Two-View Structure-from-Motion Revisited

Deep Two-View Structure-from-Motion Revisited This repository provides the code for our CVPR 2021 paper Deep Two-View Structure-from-Motion Revisited.

Jianyuan Wang 145 Jan 06, 2023
Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

47 Jun 30, 2022