Official repository for ABC-GAN

Overview

ABC-GAN

The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversarial networks at ETH zurich.

Additional Experiments

There is a second branch called loose_encoder in which you will find another experiment conducted during the thesis. Unfortunately we didn't had enough time to make an in-depth analysis of the results. The loose encoder can be added to any other experiment with a simple flag and in specific datasets such as CelebA results in astonishing improvement of generated image quality. To the best of our knowledge there is no GAN out there resulting in similar realistic images on celebA with a 64 by 64 output resolution. (Check the other branch for some samples)

Prerequisites

The code has been built and tested using the following packages (and versions)

  • Pillow (4.0.0)
  • scikit-learn (0.18.1)
  • scipy (0.18.1)
  • numpy (1.12.1)
  • tensorflow-gpu (1.1.0rc1)

Usage

Make sure you have a running tensorflow setup.

We added some special flags to have a better overview of the different experiments. One thing we added is a folder_suffix which will be appended to all checkpoint, log and samples folders.

Here are some examples to train the ABC-GAN with datasets:

  • python main.py --dataset celebA --folder_suffix=_abcgan_1_GpD_o64 --input_height=128 --output_height=64 --GpD_ratio=1 --blur_strategy=3x3 --epoch=8 --train --crop True
  • python main.py --dataset lsun --folder_suffix=_abcgan_lsun_controller_blur_3x3_o128 --input_fname_pattern=*.webp --input_height=256 --output_height=128 --GpD_ratio=-1 --blur_strategy=3x3 --epoch=20 --batch-size=64 --train --crop True
  • python main.py --dataset cifar10 --folder_suffix=_abcgan_cifar_3GpD_regressive_hyperbolic_o32 --input_height=32 --input_fname_pattern=*.png --output_height=32 --blur_strategy=reg_hyp --epoch=100 --train --crop True

Datasets

The following datasets have been used:

  • CelebA (> 200'000 celebrity faces, 178 px lower side)
  • Cifar10 (60'000 pictures of 10 categories, 32x32 px)
  • LSUN (> 3 Mio images of bedrooms 256 px lower side)
  • ImageNet subset (ImageNet subset, 64 by 64 pixels)

The easiest way to include the datasets is by having all images in one folder. Using such a dataset can be done by just changing the input_fname_pattern to the correct file ending and specifying the folder name with dataset. (The folder with the dataset has to be in the subfolder data)

Folder structure used for our experiments:

  • ABC-GAN
    • data
      • celebA
      • cifar10
      • lsun
      • train_64x64
    • download.py
    • LICENSE
    • main.py
    • model.py
    • ops.py
    • utils.py
    • README.md
    • report

train_64x64 is referring to the ImageNet subset. We used the same one as used in Improved Wasserstein GAN

Special Case LSUN

Since the LSUN dataset comes in a hierarchical structure with many files it makes sense to just use a reference file with the respective paths to the files. The best way to do that is:

  1. Inside the abc-gan/data folder create a subfolder lsun
  2. Extract the downloaded lsun dataset here (we used bedroom_train_lmdb.zip)
  3. Make a list of all files appearing in the extracted lsun_train folder
  4. Name this file lsun_images

The lsun_images file should have a structure such as:

lsun_train/f/f/f/f/f/b/fffffbb9225d069b7f47e464bdd75e6eff82b61c.webp
lsun_train/f/f/f/f/f/6/fffff6cd254f0ead6191f3003519f6805e1e6619.webp
lsun_train/f/f/f/f/f/5/fffff548f9109fc3be2d71088f8e202ea78ac620.webp
lsun_train/f/f/f/f/f/a/fffffa900959150cb53ac851b355ec4adbc22e4e.webp
lsun_train/f/f/f/f/8/0/ffff80a1dc7e7d790ccd46f2fdd4dcfca929d2c3.webp
...

In order to use LSUN just again change the input_fname_pattern and switch the dataset to lsun. We hard coded the special case of lsun such that we will use the reference file to get the paths to the images.

Results

Adaptive Controller

One of the most important points during training of GANs is balancing the discriminator against the generator. If one of the two dominates the other a mode collapse can occur. Many people started playing around with the ratio between discriminator and generator. And others used thresholds to determine if one has to train the discriminator or the generator.

In our work, we implemented a simple controller to get rid of this manual tuning. The output of the controller is a probability of either training the discriminator or the generator for one iteration. The controller gives you the following benefits:

  • Reduced training time (up to a factor of 5)
  • Reuse the same network for different datasets (The controller automatically adapts to other datasets so you don't have to tune the ratio between D and G anymore)
  • In some cases, the controller also improves stability during training

Controller

Controller architecture:

Note: The controller input is calculated using the two losses of the discriminator (loss for real and for fake images).

controller

The controller tries to keep the avg. value always at a reference point. The output of the controller is a probability of training either the discriminator or the generator.

Training behaviour

Training curve showing discriminator vs generator training iterations

Note: Without a controller and a fixed ratio between discriminator and generator updates we would see two straight lines

training curve G against D Through the controller, the training of the networks adapts itself. In the beginning, the generator is getting trained more often but after around 5k steps the discriminator takes over. As known from GAN theory we actually want the discriminator to dominate the generator. And without a controller, this is very hard to achieve without changing the loss function (Wasserstein Loss, Cramer loss etc.)

Convergence speed comparison of DCGAN with different GpD ratios and our controller

Note: One iteration is either training once the discriminator or the generator

convergence comparison with and without controller Comparison of convergence speed. GpD: Generator per Discriminator training iterations. (e.g. 3 GpD means we train the generator 3 times per discriminator)

Adaptive Blur

GANs still have trouble with stability, image quality and output resolution. We implemented an adaptive blur filter to assist the discriminator and therefore improve overall results. We figured out that the discriminator has issues with details in images. To overcome this issue we just blur all images before they reach the discriminator. So in the end, the discriminator either sees blurred images from the generator or blurred images of the dataset. Using a fixed blur such as a 3 by 3 Gaussian kernel as we used in our experiments, has the side effect of additional noise in the output image. Since the generator has not to care about the details (since his output will be blurred anyway) he can add noise. To mitigate this effect, we added an adaptive blur which changes over training time. In the beginning, we have a strong blur such that the GAN can focus on improving the base structure of the output. In the end of the training, we have almost no blur, such that the GAN can now focus on the details.

The blur gives you the following benefits:

  • Improved stability during training. (We encountered several times the case that without blur the network was not able to converge at all)
  • Improved image quality. (Despite the noise the output images look much more realistic)
  • You can increase the resolution (We were able to use DCGAN for generating images at 256 by 256 pixels using CelebA or LSUN)

DCGAN vs DCGAN with Blur (DCGAN+B)

Image showing plain DCGAN without and with blur (DCGAN+B)

DCGAN without and with blur The resulting images look much better with blur than without. They have more details but also noise.

Comparison of different blurring strategies

We compare two regressive blur kernel (e.g. make sigma of a Gaussian blur smaller during training)

Comparison of blurring strategies The hyperbolic decreasing Gaussian blur is best when it comes to reducing the noise in the images.

ABC-GAN

Combination of the Adaptive Blur and Controller GAN.

We conducted different experiments using various datasets such as LSUN, CIFAR10 and CelebA. Some of the resulting images have been downscaled by a factor of two in order to reduce noise. (Since on some screens and also printed the noise looks very annoying.)

Note: In some samples, you will see a green bar at the top. The bar is the actual input value of the controller on a per image basis. If the reference value is for example 0.25 (like in our code) this means that on average over one batch we want the green bar to be around 25%.

LSUN bedrooms dataset

ABC-GAN with a fixed Gaussian Blur kernel of 3x3 and output resolution of 256 by 256 pixels

Randomly sampled batch. Downscaled to 128 by 128 pixels to reduce noise.

lsun using ABC-GAN with fixed kernel and downscaled to 128 by 128 pixels We seem to reach the limit of DCGAN with this experiment. The same experiment without Blur failed. The images look not very realistic but one still sees that some of them look close to bedrooms.

CIFAR10

Comparison of results gathered using CIFAR10

CIFAR10 comparison our experiments

Comparison of our best results with other works

CIFAR10 comparison our work and others

Publications

Acknowledgement

  • Thanks, Prof. Luc Van Gool for the semester thesis at the Computer Vision Lab D-ITET at ETH Zurich
  • Thanks for supervising this thesis Eirikur Agustsson and Radu Timofte
  • This work has been based on the DCGAN implementation found on GitHub

Author

Igor Susmelj / @igorsusmelj

Owner
IgorSusmelj
Co-founder at Lightly Degree from ETH Zurich with a focus on embedded computing and machine learning.
IgorSusmelj
A collection of Google research projects related to Federated Learning and Federated Analytics.

Federated Research Federated Research is a collection of research projects related to Federated Learning and Federated Analytics. Federated learning i

Google Research 483 Jan 05, 2023
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
[NeurIPS'20] Multiscale Deep Equilibrium Models

Multiscale Deep Equilibrium Models 💥 💥 💥 💥 This repo is deprecated and we will soon stop actively maintaining it, as a more up-to-date (and simple

CMU Locus Lab 221 Dec 26, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

OpenDILab 185 Dec 29, 2022
This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

trimodal_person_verification This repository contains the code, and preprocessed dataset featured in "A Study of Multimodal Person Verification Using

ISSAI 7 Aug 31, 2022
Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

Luowei Zhou 373 Jan 03, 2023
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Intelligent Machines Limited 8 May 11, 2022
Secure Distributed Training at Scale

Secure Distributed Training at Scale This repository contains the implementation of experiments from the paper "Secure Distributed Training at Scale"

Yandex Research 9 Jul 11, 2022
Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Class-Attentive Diffusion Network for Semi-Supervised Classification Official Implementation of AAAI 2021 paper Class-Attentive Diffusion Network for

Jongin Lim 7 Sep 20, 2022
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Do you want a RL agent nicely moving on Atari? Rainbow is all you need! This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains bo

Jinwoo Park (Curt) 1.4k Dec 29, 2022
BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

BasicRL: easy and fundamental codes for deep reinforcement learning BasicRL is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up. It is

RayYoh 12 Apr 28, 2022
Run PowerShell command without invoking powershell.exe

PowerLessShell PowerLessShell rely on MSBuild.exe to remotely execute PowerShell scripts and commands without spawning powershell.exe. You can also ex

Mr.Un1k0d3r 1.2k Jan 03, 2023
Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

Cognitive Systems Research Group 139 Nov 30, 2022
An off-line judger supporting distributed problem repositories

Thaw 中文 | English Thaw is an off-line judger supporting distributed problem repositories. Everyone can use Thaw release problems with license on GitHu

countercurrent_time 2 Jan 09, 2022
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 08, 2023
X-VLM: Multi-Grained Vision Language Pre-Training

X-VLM: learning multi-grained vision language alignments Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts. Yan Zeng, Xi

Yan Zeng 286 Dec 23, 2022
基于Paddle框架的fcanet复现

fcanet-Paddle 基于Paddle框架的fcanet复现 fcanet 本项目基于paddlepaddle框架复现fcanet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: frazerlin-fcanet 数据准备 本项目已挂

QuanHao Guo 7 Mar 07, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022