[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Last update: Nov 09, 2022

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

pip install -r requirements.txt

Dataset Preparation

Given the dataset, please prepare the images paths in a folder named by the dataset with the following folder strcuture.

    flist/dataset_name
        ├── train.flist    # paths of training images
        ├── valid.flist    # paths of validation images
        └── test.flist     # paths of testing images

In this work, we use CelebA-HQ (Download availbale here), Places2 (Download availbale here), ParisStreet View (need author's permission to download)

ImageNet K-means Cluster: The kmeans_centers.npy is downloaded from image-gpt, it's used to quantitize the low-resolution images.

Testing with Pre-trained Models

Download pre-trained models:

CelebA-HQ: BAT ; Upsmapler
Places2: BAT ; Upsmapler
Paris-StreetView: BAT ; Upsmapler

Put the pre-trained model under the checkpoints folder, e.g.

    checkpoints
        ├── celebahq_bat_pretrain
            ├── latest_net_G.pth

Prepare the input images and masks to test.

python bat_sample.py --num_sample [1] --tran_model [bat name] --up_model [upsampler name] --input_dir [dir of input] --mask_dir [dir of mask] --save_dir [dir to save results]

Training New Models

Pretrained VGG model Download from here, move it to models/. This model is used to calculate training loss for the upsampler.

New models can be trained with the following commands.

Prepare dataset. Use --dataroot option to locate the directory of file lists, e.g. ./flist, and specify the dataset name to train with --dataset_name option. Identify the types and mask ratio using --mask_type and --pconv_level options.
Train the transformer.

# To specify your own dataset or settings in the bash file.
bash train_bat.sh

Please note that some of the transformer settings are defined in train_bat.py instead of options/, and this script will take every available gpus for training, please define the GPUs via CUDA_VISIBLE_DEVICES instead of --gpu_ids, which is used for the upsampler.

Train the upsampler.

# To specify your own dataset or settings in the bash file.
bash train_up.sh

The upsampler is typically trained by the low-resolution ground truth, we find that using some samples from the trained BAT might be helpful to improve the performance i.e. PSNR, SSIM. But the sampling process is quite time consuming, training with ground truth also could yield reasonable results.

Citation

If you find this code helpful for your research, please cite our papers.

@inproceedings{yu2021diverse,
  title={Diverse Image Inpainting with Bidirectional and Autoregressive Transformers},
  author={Yu, Yingchen and Zhan, Fangneng and Wu, Rongliang and Pan, Jianxiong and Cui, Kaiwen and Lu, Shijian and Ma, Feiying and Xie, Xuansong and Miao, Chunyan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE and minGPT, we apprecite the authors for sharing their codes.

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

Dataset Preparation

Testing with Pre-trained Models

Training New Models

Citation

Acknowledgments

Owner

Yingchen Yu

Finetune SSL models for MOS prediction

(NeurIPS 2021) Realistic Evaluation of Transductive Few-Shot Learning

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Distance Encoding for GNN Design

XViT - Space-time Mixing Attention for Video Transformer

Phy-Q: A Benchmark for Physical Reasoning

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Kaggle | 9th place single model solution for TGS Salt Identification Challenge

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

A framework for attentive explainable deep learning on tabular data

a delightful machine learning tool that allows you to train, test and use models without writing code

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Attendance Monitoring with Face Recognition using Python

Pytorch implementation of MLP-Mixer with loading pre-trained models.

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

Normal Learning in Videos with Attention Prototype Network