Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Overview

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

Recently, the power of unconditional image synthesis has significantly advanced through the use of Generative Adversarial Networks (GANs). The task of inverting an image into its corresponding latent code of the trained GAN is of utmost importance as it allows for the manipulation of real images, leveraging the rich semantics learned by the network. Recognizing the limitations of current inversion approaches, in this work we present a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism. Instead of directly predicting the latent code of a given image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner. Our residual-based encoder, named ReStyle, attains improved accuracy compared to current state-of-the-art encoder-based methods with a negligible increase in inference time. We analyze the behavior of ReStyle to gain valuable insights into its iterative nature. We then evaluate the performance of our residual encoder and analyze its robustness compared to optimization-based inversion and state-of-the-art encoders.


Different from conventional encoder-based inversion techniques, our residual-based ReStyle scheme incorporates an iterative refinement mechanism to progressively converge to an accurate inversion of real images. For each domain, we show the input image on the left followed by intermediate inversion outputs.

Description

Official Implementation of our ReStyle paper for both training and evaluation. ReStyle introduces an iterative refinement mechanism which can be applied over different StyleGAN encoders for solving the StyleGAN inversion task.

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Dependencies:
    We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/restyle_env.yaml.

Pretrained Models

In this repository, we provide pretrained ReStyle encoders applied over the pSp and e4e encoders across various domains.

Please download the pretrained models from the following links.

ReStyle + pSp

Path Description
FFHQ - ReStyle + pSp ReStyle applied over pSp trained on the FFHQ dataset.
Stanford Cars - ReStyle + pSp ReStyle applied over pSp trained on the Stanford Cars dataset.
LSUN Church - ReStyle + pSp ReStyle applied over pSp trained on the LSUN Church dataset.
AFHQ Wild - ReStyle + pSp ReStyle applied over pSp trained on the AFHQ Wild dataset.

ReStyle + e4e

Path Description
FFHQ - ReStyle + e4e Coming Soon!
Stanford Cars - ReStyle + e4e Coming Soon!
LSUN Church - ReStyle + e4e Coming Soon!
AFHQ Wild - ReStyle + e4e Coming Soon!
LSUN Horse - ReStyle + e4e ReStyle applied over e4e trained on the LSUN Horse dataset.

Auxiliary Models

In addition, we provide various auxiliary models needed for training your own ReStyle models from scratch.
This includes the StyleGAN generators and pre-trained models used for loss computation.

Path Description
FFHQ StyleGAN StyleGAN2 model trained on FFHQ with 1024x1024 output resolution.
LSUN Car StyleGAN StyleGAN2 model trained on LSUN Car with 512x384 output resolution.
LSUN Church StyleGAN StyleGAN2 model trained on LSUN Church with 256x256 output resolution.
LSUN Horse StyleGAN StyleGAN2 model trained on LSUN Horse with 256x256 output resolution.
AFHQ Wild StyleGAN StyleGAN-ADA model trained on AFHQ Wild with 512x512 output resolution.
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain.
ResNet-34 Model ResNet-34 model trained on ImageNet taken from torchvision for initializing our encoder backbone.
MoCov2 Model Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based loss on non-facial domains. The model is taken from the official implementation.
CurricularFace Backbone Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNN Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

Note: all StyleGAN models are converted from the official TensorFlow models to PyTorch using the conversion script from rosinality.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Preparing your Data

In order to train ReStyle on your own data, you should perform the following steps:

  1. Update configs/paths_config.py with the necessary data paths and model paths for training and inference.
dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/to/test/data',
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py. There, you should define the source/target data paths for the train and test sets as well as the transforms to be used for training and inference.
DATASETS = {
	'my_data_encode': {
		'transforms': transforms_config.EncodeTransforms,   # can define a custom transform, if desired
		'train_source_root': dataset_paths['train_data'],
		'train_target_root': dataset_paths['train_data'],
		'test_source_root': dataset_paths['test_data'],
		'test_target_root': dataset_paths['test_data'],
	}
}
  1. To train with your newly defined dataset, simply use the flag --dataset_type my_data_encode.

Preparing your Generator

In this work, we use rosinality's StyleGAN2 implementation. If you wish to use your own generator trained using NVIDIA's implementation there are a few options we recommend:

  1. Using NVIDIA's StyleGAN2 / StyleGAN-ADA TensorFlow implementation.
    You can then convert the TensorFlow .pkl checkpoints to the supported format using the conversion script found in rosinality's implementation.
  2. Using NVIDIA's StyleGAN-ADA PyTorch implementation.
    You can then convert the PyTorch .pkl checkpoints to the supported format using the conversion script created by Justin Pinkney found in dvschultz's fork.

Once you have the converted .pt files, you should be ready to use them in this repository.

Training ReStyle

The main training scripts can be found in scripts/train_restyle_psp.py and scripts/train_restyle_e4e.py. Each of the two scripts will run ReStyle applied over the corresponding base inversion method.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

We currently support applying ReStyle on the pSp encoder from Richardson et al. [2020] and the e4e encoder from Tov et al. [2021].

Training ReStyle with the settings used in the paper can be done by running the following commands.

  • ReStyle applied over pSp:
python scripts/train_restyle_psp.py \
--dataset_type=ffhq_encode \
--encoder_type=BackboneEncoder \
--exp_dir=experiment/restyle_psp_ffhq_encode \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=5000 \
--save_interval=10000 \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--w_norm_lambda=0 \
--id_lambda=0.1 \
--input_nc=6 \
--n_iters_per_batch=5 \
--output_size=1024 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt
  • ReStyle applied over e4e:
python scripts/train_restyle_e4e.py \
--dataset_type ffhq_encode \
--encoder_type ProgressiveBackboneEncoder \
--exp_dir=experiment/restyle_e4e_ffhq_encode \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--delta_norm_lambda 0.0002 \
--id_lambda 0.1 \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--progressive_step_every 2000 \
--input_nc 6 \
--n_iters_per_batch=5 \
--output_size 1024 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

Additional Notes:

  • Encoder backbones:
    • For the human facial domain (ffhq_encode), we use an IRSE-50 backbone using the flags:
      • --encoder_type=BackboneEncoder for pSp
      • --encoder_type=ProgressiveBackboneEncoder for e4e
    • For all other domains, we use a ResNet34 encoder backbone using the flags:
      • --encoder_type=ResNetBackboneEncoder for pSp
      • --encoder_type=ResNetProgressiveBackboneEncoder for e4e
  • ID/similarity losses:
    • For the human facial domain we also use a specialized ID loss which is set using the flag --id_lambda=0.1.
    • For all other domains, please set --id_lambda=0 and --moco_lambda=0.5 to use the MoCo-based similarity loss from Tov et al.
      • Note, you cannot set both id_lambda and moco_lambda to be active simultaneously.
  • You should also adjust the --output_size and --stylegan_weights flags according to your StyleGAN generator.
  • See options/train_options.py and options/e4e_train_options.py for all training-specific flags.

Inference Notebook

To help visualize the results of ReStyle we provide a Jupyter notebook found in notebooks/inference_playground.ipynb.
The notebook will download the pretrained models and run inference on the images found in notebooks/images or on images of your choosing. It is recommended to run this in Google Colab.

Testing

Inference

You can use scripts/inference_iterative.py to apply a trained model on a set of images:

python scripts/inference_iterative.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=5

This script will save each step's outputs in a separate sub-directory (e.g., the outputs of step i will be saved in /path/to/experiment/inference_results/i).

Notes:

  • By default, the images will be saved at their original output resolutions (e.g., 1024x1024 for faces, 512x384 for cars). If you wish to save outputs resized to resolutions of 256x256 (or 256x192 for cars), you can do so by adding the flag --resize_outputs.
  • This script will also save all the latents as an .npy file in a dictionary format as follows:
{
    "0.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    "1.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    ...
}

That is, the keys of the dictionary are the image file names and the values are lists of length N containing the output latent of each step where N is the number of inference steps. Each element in the list is of shape (Kx512) where K is the number of style inputs of the generator.

You can use the saved latents to perform latent space manipulations, for example.

Step-by-Step Inference


Visualizing the intermediate outputs. Here, the intermediate outputs are saved from left to right with the input image shown on the right-hand side.

Sometimes, you may wish to save each step's outputs side-by-side instead of in separate sub-folders. This would allow one to easily see the progression in the reconstruction with each step. To save the step-by-step outputs as a single image, you can run the following:

python scripts/inference_iterative_save_coupled.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=5

Computing Metrics

Given a trained model and generated outputs, we can compute the loss metrics on a given dataset.
These scripts receive the inference output directory and ground truth directory.

  • Calculating LPIPS loss:
python scripts/calc_losses_on_images.py \
--mode lpips
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \
  • Calculating L2 loss:
python scripts/calc_losses_on_images.py \
--mode l2
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \
  • Calculating the identity loss for the human facial domain:
python scripts/calc_id_loss_parallel.py \
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \

These scripts will traverse through each sub-directory of output_path to compute the metrics on each step's output images.

Encoder Bootstrapping


Image toonification results using our proposed encoder bootstrapping technique.

In the paper, we introduce an encoder bootstrapping technique that can be used to solve the image toonification task by pairing an FFHQ-based encoder with a Toon-based encoder.
Below we provide the models used to generate the results in the paper:

Path Description
FFHQ - ReStyle + pSp Same FFHQ encoder as linked above.
Toonify - ReStyle + pSp ReStyle applied over pSp trained for the image toonification task.
Toonify Generator Toonify generator from Doron Adler and Justin Pinkney converted to Pytorch using rosinality's conversion script.

Note that the ReStyle toonify model is trained using only real images with no paired data. More details regarding the training parameters and settings of the toonify encoder can be found here.

If you wish to run inference using these two models and the bootstrapping technique you may run the following:

python scripts/encoder_bootstrapping_inference.py \
--exp_dir=/path/to/experiment \
--model_1_checkpoint_path=/path/to/restyle_psp_ffhq_encode.pt \
--model_2_checkpoint_path=/path/to/restyle_psp_toonify.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=1  # one step for each encoder is typically good

Here, we output the per-step outputs side-by-side with the inverted initialization real-image on the left and the original input image on the right.

Repository structure

Path Description
SAM Repository root folder
├  configs Folder containing configs defining model/data paths and data transforms
├  criteria Folder containing various loss criterias for training
├  datasets Folder with various dataset objects
├  docs Folder containing images displayed in the README
├  environment Folder containing Anaconda environment used in our experiments
├  licenses Folder containing licenses of the open source projects used in this repository
├ models Folder containing all the models and training objects
│  ├  e4e_modules Folder containing the latent discriminator implementation from encoder4editing
│  ├  encoders Folder containing various architecture implementations including our simplified encoder architectures
│  ├  mtcnn MTCNN implementation from TreB1eN
│  ├  stylegan2 StyleGAN2 model from rosinality
│  ├  psp.py Implementation of pSp encoder extended to work with ReStyle
│  └  e4e.py Implementation of e4e encoder extended to work with ReStyle
├  notebooks Folder with jupyter notebook containing ReStyle inference playground
├  options Folder with training and test command-line options
├  scripts Folder with running scripts for training, inference, and metric computations
├  training Folder with main training logic and Ranger implementation from lessw2020
├  utils Folder with various utility functions

Credits

StyleGAN2 model and implementation:
https://github.com/rosinality/stylegan2-pytorch
Copyright (c) 2019 Kim Seonghyeon
License (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE

IR-SE50 model and implementations:
https://github.com/TreB1eN/InsightFace_Pytorch
Copyright (c) 2018 TreB1eN
License (MIT) https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/LICENSE

Ranger optimizer implementation:
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
License (Apache License 2.0) https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/LICENSE

LPIPS model and implementation:
https://github.com/S-aiueo32/lpips-pytorch
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE

pSp model and implementation:
https://github.com/eladrich/pixel2style2pixel
Copyright (c) 2020 Elad Richardson, Yuval Alaluf
License (MIT) https://github.com/eladrich/pixel2style2pixel/blob/master/LICENSE

e4e model and implementation:
https://github.com/omertov/encoder4editing Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE

Please Note: The CUDA files under the StyleGAN2 ops directory are made available under the Nvidia Source Code License-NC

Acknowledgments

This code borrows heavily from pixel2style2pixel and encoder4editing.

Citation

If you use this code for your research, please cite the following works:

@misc{alaluf2021restyle,
      title={ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement}, 
      author={Yuval Alaluf and Or Patashnik and Daniel Cohen-Or},
      year={2021},
      eprint={2104.02699},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@article{tov2021designing,
  title={Designing an Encoder for StyleGAN Image Manipulation},
  author={Tov, Omer and Alaluf, Yuval and Nitzan, Yotam and Patashnik, Or and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2102.02766},
  year={2021}
}
@article{richardson2020encoding,
  title={Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},
  author={Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2008.00951},
  year={2020}
}
Owner
Computer vision research scientist and enthusiast
On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation On Nonlinear Latent Transformations for GAN-based Image Editi

Valentin Khrulkov 22 Oct 24, 2022
A list of Machine Learning Art Colabs

ML Visual Art Colabs A list of cool Colabs on Machine Learning Imagemaking or other artistic purposes 3D Ken Burns Effect Ken Burns Effect by Manuel R

Derrick Schultz (he/him) 789 Dec 12, 2022
Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion

Feature-Style Encoder for Style-Based GAN Inversion Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion. Code will

InterDigital 63 Jan 03, 2023
Vehicle Detection Using Deep Learning and YOLO Algorithm

VehicleDetection Vehicle Detection Using Deep Learning and YOLO Algorithm Dataset take or find vehicle images for create a special dataset for fine-tu

Maryam Boneh 96 Jan 05, 2023
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 06, 2023
Large scale PTM - PPI relation extraction

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT The silver standard

1 Feb 25, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 08, 2023
4th place solution to datafactory challenge by Intermarché.

Solution to Datafactory challenge by Intermarché. 4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to pre

Raphael Sourty 11 Mar 19, 2022
Graph Transformer Architecture. Source code for

Graph Transformer Architecture Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bres

NTU Graph Deep Learning Lab 561 Jan 08, 2023
Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

MeMOT - Pytorch (wip) Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch. This paper is just one in a line of work, but importan

Phil Wang 15 May 09, 2022
A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

Automatic_Background_Remover A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku. 👉 https:

Gaurav 16 Oct 29, 2022
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Subreddit Analysis This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by

Georgios Chochlakis 1 Dec 17, 2021
Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting

Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting 1. Classification Task PyTorch implementat

Yongho Kim 0 Apr 24, 2022
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 02, 2023
Instance-based label smoothing for improving deep neural networks generalization and calibration

Instance-based Label Smoothing for Neural Networks Pytorch Implementation of the algorithm. This repository includes a new proposed method for instanc

Mohamed Maher 1 Aug 13, 2022
Parameterising Simulated Annealing for the Travelling Salesman Problem

Parameterising Simulated Annealing for the Travelling Salesman Problem

Gary Sun 55 Jun 15, 2022