Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Last update: Dec 23, 2022

Overview

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

This is the source code for our paper Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving by Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li and Gao Huang. Code is modified from Swapping Autoencoder, StarGAN v2, Image2StyleGAN.

This is a frequency-based image translation framework that is effective for identity preserving and image realism. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Our training objective facilitates the preservation of frequency information in both pixel space and Fourier spectral space.

1. Swapping Autoencoder

Dataset Preparation

You can download the following datasets:

Flicker Mountain (training set, validation set)
Flicker Waterfall (training set, validation set)
CelebA-HQ
LSUN Church
LSUN Bedroom

Then place the training data and validation data in ./swapping-autoencoder/dataset/.

Train the model

You can train the model using either lmdb or folder format. For training the FDIT assisted Swapping Autoencoder, please run:

cd swapping-autoencoder 
bash train.sh

Change the location of the dataset according to your own setting.

Evaluate the model

Generate image hybrids

Place the source images and reference images under the folder ./sample_pair/source and ./sample_pair/ref respectively. The two image pairs should have the exact same index, such as 0.png, 1.png, ...

To generate the image hybrids according to the source and reference images, please run:

bash eval_pairs.sh

Evaluate the image quality

To evaluate the image quality using Fréchet Inception Distance (FID), please run

bash eval.sh

The pretrained model is provided here.

2. Image2StyleGAN

Prepare the dataset

You can place your own images or our official dataset under the folder ./Image2StlyleGAN/source_image. If using our dataset, then unzip it into that folder.

cd Image2StlyleGAN
unzip source_image.zip

Get the weight files

To get the pretrained weights in StyleGAN, please run:

cd Image2StlyleGAN/weight_files/pytorch
wget https://pages.cs.wisc.edu/~mucai/fdit/karras2019stylegan-ffhq-1024x1024.pt

Run GAN-inversion model:

Single image inversion

Run the following command by specifying the name of the image image_name:

python encode_image_freq.py --src_im  image_name

Group images inversion

Please run

python encode_image_freq_batch.py

Quantitative Evaluation

To get the image reconstruction metrics such as MSE, MAE, PSNR, please run:

python eval.py

3. StarGAN v2

Prepare the dataset

Please download the CelebA-HQ-Smile dataset into ./StarGANv2/data

Train the model

To train the model in Tesla V100, please run:

cd StarGANv2
bash train.sh

Evaluation

To get the image translation samples and image quality measures like FID, please run:

bash eval.sh

Pretrained Model

The pretrained model can be found here.

Image Translation Results

FDIT achieves state-of-the-art performance in several image translation and even GAN-inversion models.

Citation

If you use our codebase or datasets, please cite our work:

@article{cai2021frequency,
title={Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving},
author={Cai, Mu and Zhang, Hong and Huang, Huijuan and Geng, Qichuan and Li, Yixuan and Huang, Gao},
journal={In Proceedings of International Conference on Computer Vision (ICCV)},
year={2021}
}

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Related tags

Overview

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

1. Swapping Autoencoder

Dataset Preparation

Train the model

Evaluate the model

Generate image hybrids

Evaluate the image quality

2. Image2StyleGAN

Prepare the dataset

Get the weight files

Run GAN-inversion model:

Single image inversion

Group images inversion

Quantitative Evaluation

3. StarGAN v2

Prepare the dataset

Train the model

Evaluation

Pretrained Model

Image Translation Results

Citation

Owner

Mu Cai

The toolkit to generate auto labeled datasets

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm

Highway networks implemented in PyTorch.

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

The code for two papers: Feedback Transformer and Expire-Span.

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

BERTMap: A BERT-Based Ontology Alignment System

Monify: an Expense tracker Program implemented in a Graphical User Interface that allows users to keep track of their expenses

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

Libraries, tools and tasks created and used at DeepMind Robotics.

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Reproduction process of AlexNet

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Distance-Ratio-Based Formulation for Metric Learning

Caffe models in TensorFlow

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

The author's officially unofficial PyTorch BigGAN implementation.