CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

Related tags

Deep LearningCLADE
Overview

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

Architecture

ArXiv Paper

Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Abstract

Spatially-adaptive normalization SPADE is remarkably successful recently in conditional semantic image synthesis, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away. Despite its impressive performance, a more thorough understanding of the advantages inside the box is still highly demanded to help reduce the significant computation and parameter overhead introduced by this novel structure. In this paper, from a return-on-investment point of view, we conduct an in-depth analysis of the effectiveness of this spatially-adaptive normalization and observe that its modulation parameters benefit more from semantic-awareness rather than spatial-adaptiveness, especially for high-resolution input masks. Inspired by this observation, we propose class-adaptive normalization (CLADE), a lightweight but equally-effective variant that is only adaptive to semantic class. In order to further improve spatial-adaptiveness, we introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE and propose a truly spatially-adaptive variant of CLADE, namely CLADE-ICPE. %Benefiting from this design, CLADE greatly reduces the computation cost while being able to preserve the semantic information in the generation. Through extensive experiments on multiple challenging datasets, we demonstrate that the proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE, but it is much more efficient with fewer extra parameters and lower computational cost.

Installation

Clone this repo.

git clone https://github.com/tzt101/CLADE.git
cd CLADE/

This code requires PyTorch 1.6 and python 3+. Please install dependencies by

pip install -r requirements.txt

Dataset Preparation

The Cityscapes, COCO-Stuff and ADE20K dataset can be download and prepared following SPADE. We provide the ADE20K-outdoor dataset selected by ourselves in OneDrive.

To make the distance mask which called intra-class positional encoding map in the paper, you can use the following commands:

python uitl/cal_dist_masks.py --path [Path_to_dataset] --dataset [ade20k | coco | cityscapes]

By default, the distance mask is normalized. If you do not want it, please set --norm no.

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

  1. Download the pretrained models from the OneDrive, save it in checkpoints/. The structure is as follows:
./checkpoints/
    ade20k/
        best_net_G.pth
    ade20k_dist/
        best_net_G.pth
    ade20k_outdoor/
        best_net_G.pth
    ade20k_outdoor_dist/
        best_net_G.pth
    cityscapes/
        best_net_G.pth
    cityscapes_dist/
        best_net_G.pth
    coco/
        best_net_G.pth
    coco_dist/
        best_net_G.pth

_dist means that the model use the additional positional encoding, called CLADE-ICPE in the paper.

  1. Generate the images on the test dataset.
python test.py --name [model_name] --norm_mode clade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset]

[model_name] is the directory name of the checkpoint file downloaded in Step 1, such as ade20k and coco. [dataset] can be on of ade20k, ade20koutdoor, cityscapes and coco. [Path_to_dataset] is the path to the dataset. If you want to test CALDE-ICPE, the command is as follows:

python test.py --name [model_name] --norm_mode clade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset] --add_dist

Training New Models

You can train your own model with the following command:

# To train CLADE and CLADE-ICPE.
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode clade --dataroot [Path_to_dataset]
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode clade --dataroot [Path_to_dataset] --add_dist

If you want to test the model during the training step, please set --train_eval. By default, the model every 10 epoch will be test in terms of FID. Finally, the model with best FID score will be saved as best_net_G.pth.

Calculate FID

We provide the code to calculate the FID which is based on rpo. We have pre-calculated the distribution of real images (all images are resized to 256×256 except cityscapes is 512×256) in training set of each dataset and saved them in ./datasets/train_mu_si/. You can run the following command:

python fid_score.py [Path_to_real_image] [Path_to_fake_image] --batch-size 1 --gpu 0 --load_np_name [dataset] --resize [Size]

The provided [dataset] are: ade20k, ade20koutdoor, cityscapes and coco. You can save the new dataset by replacing --load_np_name [dataset] with --save_np_name [dataset].

New Useful Options

The new options are as follows:

  • --use_amp: if specified, use AMP training mode.
  • --train_eval: if sepcified, evaluate the model during training.
  • --eval_dims: the default setting is 2048, Dimensionality of Inception features to use.
  • --eval_epoch_freq: the default setting is 10, frequency of calculate fid score at the end of epochs.

Code Structure

  • train.py, test.py: the entry point for training and testing.
  • trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
  • models/pix2pix_model.py: creates the networks, and compute the losses
  • models/networks/: defines the architecture of all models
  • options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
  • data/: defines the class for loading images and label maps.

Citation

If you use this code for your research, please cite our papers.

@article{tan2021efficient,
  title={Efficient Semantic Image Synthesis via Class-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Hua, Gang and Yu, Nenghai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021},
  publisher={IEEE}
}
@article{tan2020rethinking,
  title={Rethinking Spatially-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Yu, Nenghai},
  journal={arXiv preprint arXiv:2004.02867},
  year={2020}
}
@article{tan2020semantic,
  title={Semantic Image Synthesis via Efficient Class-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Gang Hua and Yu, Nenghai},
  journal={arXiv preprint arXiv:2012.04644},
  year={2020}
}

Acknowledgments

This code borrows heavily from SPADE.

Official implementation of VQ-Diffusion

Vector Quantized Diffusion Model for Text-to-Image Synthesis Overview This is the official repo for the paper: [Vector Quantized Diffusion Model for T

Microsoft 592 Jan 03, 2023
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

Phil Wang 90 Nov 24, 2022
Stacked Generative Adversarial Networks

Stacked Generative Adversarial Networks This repository contains code for the paper "Stacked Generative Adversarial Networks", CVPR 2017. Part of the

Xun Huang 241 May 07, 2022
[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

Junhyeong Cho 29 Dec 10, 2022
RLHive: a framework designed to facilitate research in reinforcement learning.

RLHive is a framework designed to facilitate research in reinforcement learning. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments.

88 Jan 05, 2023
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

62 Dec 21, 2022
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Awesome production machine learning This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, versi

The Institute for Ethical Machine Learning 12.9k Jan 04, 2023
September-Assistant - Open-source Windows Voice Assistant

September - Windows Assistant September is an open-source Windows personal assis

The Nithin Balaji 9 Nov 22, 2022
Official implementation for the paper: Generating Smooth Pose Sequences for Diverse Human Motion Prediction

Generating Smooth Pose Sequences for Diverse Human Motion Prediction This is official implementation for the paper Generating Smooth Pose Sequences fo

Wei Mao 28 Dec 10, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
A motion detection system with RaspberryPi, OpenCV, Python

Human Detection System using Raspberry Pi Functionality Activates a relay on detecting motion. You may need following components to get the expected R

Omal Perera 55 Dec 04, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
External Attention Network

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks paper : https://arxiv.org/abs/2105.02358 Jittor code will come soon

MenghaoGuo 357 Dec 11, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Dec 27, 2022
Official Implementation for Fast Training of Neural Lumigraph Representations using Meta Learning.

Fast Training of Neural Lumigraph Representations using Meta Learning Project Page | Paper | Data Alexander W. Bergman, Petr Kellnhofer, Gordon Wetzst

Alex 39 Oct 08, 2022
Training Cifar-10 Classifier Using VGG16

opevcvdl-hw3 This project uses pytorch and Qt to achieve the requirements. Version Python 3.6 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.

Kenny Cheng 3 Aug 17, 2022
Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

CNNs fruits360 Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class. CNN on a pretrained model Build a CNN on a pretrained model, Res

Ricky Chuang 1 Mar 07, 2022
Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Who has never met a forged picture on the web ? No one ! Everyday we are constantly facing fake pictures touched up in Photoshop but it is not always

Rony Abecidan 77 Dec 16, 2022
Code for Transformer Hawkes Process, ICML 2020.

Transformer Hawkes Process Source code for Transformer Hawkes Process (ICML 2020). Run the code Dependencies Python 3.7. Anaconda contains all the req

Simiao Zuo 111 Dec 26, 2022