Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Overview

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021)

Open In Colab

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Description

Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation" paper for both training and evaluation. The e4e encoder is specifically designed to complement existing image manipulation techniques performed over StyleGAN's latent space.

Recent Updates

2021.08.17: Add single style code encoder (use --encoder_type SingleStyleCodeEncoder).
2021.03.25: Add pose editing direction.

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Clone the repository:
git clone https://github.com/omertov/encoder4editing.git
cd encoder4editing
  • Dependencies:
    We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/e4e_env.yaml.

Inference Notebook

We provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to encode and perform several editings on real images using StyleGAN.

Pretrained Models

Please download the pre-trained models from the following links. Each e4e model contains the entire pSp framework architecture, including the encoder and decoder weights.

Path Description
FFHQ Inversion FFHQ e4e encoder.
Cars Inversion Cars e4e encoder.
Horse Inversion Horse e4e encoder.
Church Inversion Church e4e encoder.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.

In addition, we provide various auxiliary models needed for training your own e4e model from scratch.

Path Description
FFHQ StyleGAN StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution.
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during training.
MOCOv2 Model Pretrained ResNet-50 model trained using MOCOv2 for use in our simmilarity loss for domains other then human faces during training.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

To train the e4e encoder, make sure the paths to the required models, as well as training and testing data is configured in configs/path_configs.py and configs/data_configs.py.

Training the e4e Encoder

python scripts/train.py \
--dataset_type cars_encode \
--exp_dir new/experiment/directory \
--start_from_latent_avg \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--id_lambda 0.5 \
--val_interval 10000 \
--max_steps 200000 \
--stylegan_size 512 \
--stylegan_weights path/to/pretrained/stylegan.pt \
--workers 8 \
--batch_size 8 \
--test_batch_size 4 \
--test_workers 4 

Training on your own dataset

In order to train the e4e encoder on a custom dataset, perform the following adjustments:

  1. Insert the paths to your train and test data into the dataset_paths variable defined in configs/paths_config.py:
dataset_paths = {
    'my_train_data': '/path/to/train/images/directory',
    'my_test_data': '/path/to/test/images/directory'
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py:
DATASETS = {
   'my_data_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['my_train_data'],
        'train_target_root': dataset_paths['my_train_data'],
        'test_source_root': dataset_paths['my_test_data'],
        'test_target_root': dataset_paths['my_test_data']
    }
}

Refer to configs/transforms_config.py for the transformations applied to the train and test images during training.

  1. Finally, run a training session with --dataset_type my_data_encode.

Inference

Having trained your model, you can use scripts/inference.py to apply the model on a set of images.
For example,

python scripts/inference.py \
--images_dir=/path/to/images/directory \
--save_dir=/path/to/saving/directory \
path/to/checkpoint.pt 

Latent Editing Consistency (LEC)

As described in the paper, we suggest a new metric, Latent Editing Consistency (LEC), for evaluating the encoder's performance. We provide an example for calculating the metric over the FFHQ StyleGAN using the aging editing direction in metrics/LEC.py.

To run the example:

cd metrics
python LEC.py \
--images_dir=/path/to/images/directory \
path/to/checkpoint.pt 

Acknowledgments

This code borrows heavily from pixel2style2pixel

Citation

If you use this code for your research, please cite our paper Designing an Encoder for StyleGAN Image Manipulation:

@article{tov2021designing,
  title={Designing an Encoder for StyleGAN Image Manipulation},
  author={Tov, Omer and Alaluf, Yuval and Nitzan, Yotam and Patashnik, Or and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2102.02766},
  year={2021}
}
Structural Constraints on Information Content in Human Brain States

Structural Constraints on Information Content in Human Brain States Code accompanying the paper "The information content of brain states is explained

Leon Weninger 3 Sep 07, 2022
A GridMixup augmentation, inspired by GridMask and CutMix

GridMixup A GridMixup augmentation, inspired by GridMask and CutMix Easy install pip install git+https://github.com/IlyaDobrynin/GridMixup.git Overvie

IlyaDo 42 Dec 28, 2022
Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

Less is More: Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification Suncheng Xiang Shanghai Jiao Tong University Over

SunchengXiang 68 Dec 13, 2022
Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

Martin Knoche 10 Dec 12, 2022
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei

Haoyu Ma 29 Dec 23, 2022
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022
Code for the paper "Location-aware Single Image Reflection Removal"

Location-aware Single Image Reflection Removal The shown images are provided by the datasets from IBCLN, ERRNet, SIR2 and the Internet images. The cod

72 Dec 08, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

631 Jan 04, 2023
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
ML model to classify between cats and dogs

Cats-and-dogs-classifier This is my first ML model which can classify between cats and dogs. Here the accuracy is around 75%, however , the accuracy c

Sharath V 4 Aug 20, 2021
Shitty gaze mouse controller

demo.mp4 shitty_gaze_mouse_cotroller install tensofflow, cv2 run the main.py and as it starts it will collect data so first raise your left eyebrow(bo

16 Aug 30, 2022
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 09, 2022
This is the code for HOI Transformer

HOI Transformer Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer. Reproduction We recomend you to

BigBangEpoch 124 Dec 29, 2022
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

QAConv Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting This PyTorch code is proposed in

Shengcai Liao 166 Dec 28, 2022
Teaches a student network from the knowledge obtained via training of a larger teacher network

Distilling-the-knowledge-in-neural-network Teaches a student network from the knowledge obtained via training of a larger teacher network This is an i

Abhishek Sinha 146 Dec 11, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

Lyft Motion Prediction for Autonomous Vehicles Code for the 4th place solution of Lyft Motion Prediction for Autonomous Vehicles on Kaggle. Discussion

44 Jun 27, 2022
PyTorch Connectomics: segmentation toolbox for EM connectomics

Introduction The field of connectomics aims to reconstruct the wiring diagram of the brain by mapping the neural connections at the level of individua

Zudi Lin 132 Dec 26, 2022
[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Structured Sparse R-CNN for Direct Scene Graph Generation Our paper Structured Sparse R-CNN for Direct Scene Graph Generation has been accepted by CVP

Multimedia Computing Group, Nanjing University 44 Dec 23, 2022