Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Last update: Dec 02, 2022

Overview

Style Transformer for Image Inversion and Editing (CVPR2022)

Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously. This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN which is not only with less distortions, but also of high quality and flexibility for editing. The proposed model employs a CNN encoder to provide multi-scale image features as keys and values. Meanwhile it regards the style code to be determined for different layers of the generator as queries. It first initializes query tokens as learnable parameters and maps them into $W^+$ space. Then the multi-stage alternate self- and cross-attention are utilized, updating queries with the purpose of inverting the input by the generator. Moreover, based on the inverted code, we investigate the reference- and label-based attribute editing through a pretrained latent classifier, and achieve flexible image-to-image translation with high quality results. Extensive experiments are carried out, showing better performances on both inversion and editing tasks within StyleGAN.

Our style transformer proposes novel multi-stage style transformer in w+ space to invert image accurately, and we characterize the image editing in StyleGAN into label-based and reference-based, and use a non-linear classifier to generate the editing vector.

Getting Started

Prerequisites

Ubuntu 16.04
NVIDIA GPU + CUDA CuDNN
Python 3

Pretrained Models

We provide the pre-trained models of inversion for face and car domains.

Training

Preparing Datasets

Update configs/paths_config.py with the necessary data paths and model paths for training and inference.

dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/to/test/data',
}

Preparing Generator

We use rosinality's StyleGAN2 implementation. You can download the 256px pretrained model in the project and put it in the directory /pretrained_models.

Training Inversion Model

python scripts/train.py \
--dataset_type=ffhq_encode \
--exp_dir=results/train_style_transformer \
--batch_size=8 \
--test_batch_size=8 \
--val_interval=5000 \
--save_interval=10000 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

Inference

python scripts/inference.py \
--exp_dir=results/infer_style_transformer \
--checkpoint_path=results/train_style_transformer/checkpoints/best_model.pt \
--data_path=/test_data \
--test_batch_size=8 \

Citation

If you use this code for your research, please cite

@article{hu2022style,
  title={Style Transformer for Image Inversion and Editing},
  author={Hu, Xueqi and Huang, Qiusheng and Shi, Zhengyi and Li, Siyuan and Gao, Changxin and Sun, Li and Li, Qingli},
  journal={arXiv preprint arXiv:2203.07932},
  year={2022}
}

Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Related tags

Overview

Style Transformer for Image Inversion and Editing (CVPR2022)

Getting Started

Prerequisites

Pretrained Models

Training

Preparing Datasets

Preparing Generator

Training Inversion Model

Inference

Citation

Owner

Xueqi Hu

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Code and data (Incidents Dataset) for ECCV 2020 Paper "Detecting natural disasters, damage, and incidents in the wild".

TensorFlow CNN for fast style transfer

Tandem Mass Spectrum Prediction with Graph Transformers

Conversational text Analysis using various NLP techniques

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

Model Serving Made Easy

The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]