Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

Overview

GSL - Zero-shot Synthesis with Group-Supervised Learning

image Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD, and Fonts). Bottom: training images (attributes are known). Top: Test image (attributes are a query).

Zero-shot Synthesis with Group-Supervised Learning
Yunhao Ge, Sami Abu-El-Haija, Gan Xin, Laurent Itti
International Conference on Learning Representations (ICLR), 2021

[Paper] [Project Page] [Fonts dataset]

To aid neural networks to envision objects with different attributes, we propose a family of objective functions, expressed on groups of examples, as a novel learning framework that we term Group-Supervised Learning (GSL). GSL allows us to decompose inputs into a disentangled representation with swappable components, that can be recombined to synthesize new samples. (i.e., images of red boats & blue cars can be decomposed and recombined to synthesize novel images of red cars.

[We are actively updating the code]

Getting Started

Installation

  • Dependencies
python 3.6.4
pytorch 0.3.1.post2
visdom
tqdm

  • Clone this repo:
git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
cd pytorch-CycleGAN-and-pix2pix

Datasets

  • iLab-20M, is an attributed dataset containing images of toy vehicles placed on a turntable using 11 cameras at different viewing points. There are 3 attribute classes: vehicle identity: 15 categories, each having 25-160 instances; pose; and backgrounds: over 14 for each identity: projecting vehicles in relevant contexts. You can download a subset of iLab-20M that we used in our paper here: iLab-6pose [http://ilab.usc.edu/datasets/iLab_6pose.zip]

  • Fonts, is a computer-generated RGB image datasets. Each image, with 128 * 128 pixels, contains an alphabet letter rendered using 5 independent generating attributes: letter identity, size, font color, background color and font. you can download the fonts dataset at here: Fonts [http://ilab.usc.edu/datasets/fonts].

  • RaFD contains pictures of 67 models displaying 8 emotional expressions taken by 5 different camera angles simultaneously. There are 3 attributes: identity, camera position (pose), and expression. To download the RaFD dataset, you must request access to the dataset from the Radboud Faces Database website.

  • dSprites, is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite. you can download the dSprites dataset here dSprites

Datasets Preprocess

To efficiently access the dataset in a manner of Group-Supervised Learning, some dataset need preprocess.

  • For iLab-20M dataset, after downloading iLab-6pose subset, please run python3 ./utils/ilab_data_preprocess.py
  • For RaFD dataset, after downloading, please run python3 ./utils/rafd_data_preprocess.py
  • For desprites dataset, after downloading, please run python3 ./utils/desprites_data_preprocess.py
  • For Fonts dataset, no preprocess needed.

After preprocess, please update the dataset path in '--dataset_path' parameter

Synthesis with pretrained model

You can download the pretrained models of ilab-20M, Fonts, RaFD and dsprites here pretrained models (http://ilab.usc.edu/datasets/GSL_pretrained_models.zip) and put them to ./checkpoints/pretrained_models The sample test images are in the ./checkpoints/test_imgs You can use the following sample commands to synthesize zero-shot images with our pretrained models:

  • For Fonts
python3 main.py --train False --dataset Fonts --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/fonts' --viz_name fonts
  • For iLab-20M
python3 main.py --train False --dataset ilab-20M --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/ilab_20M' --viz_name ilab-20m
  • For RaFD
python3 main.py --train False --dataset RaFD --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/rafd' --viz_name rafd
  • For dsprites
python3 main.py --train False --dataset dsprites--pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/dsprites' --viz_name dsprites

Train GZS-Net on datasets used in paper

Group-Supervised Zero-shot Synthesis Network (GZS-Net) is an implemetation of Group-Supervised Learning with only reconstruction loss. If you want to train GZS-Net with the 4 datasets used in paper (Fonts, iLab-20M, RaFD, dSprites), please use 'train.py' with the dataset name, dataset path and visualize pannel name in Visdom. Note: you can also set the hyperparameter of lr, batchsize, backbone structure in train.py Here are some examples:

  • For Fonts
python3 main.py --train True --dataset Fonts --dataset_path YOUR_LOCAL_PATH_OF_FONTS --viz_name fonts
  • For iLab-20M
python3 main.py --train True --dataset ilab-20M --dataset_path YOUR_LOCAL_PATH_OF_ILAB --viz_name ilab-20m
  • For RaFD
python3 main.py --train True --dataset RaFD --dataset_path YOUR_LOCAL_PATH_OF_RaFD --viz_name rafd
  • For dsprites
python3 main.py --train True --dataset dsprites--dataset_path YOUR_LOCAL_PATH_OF_DSPRITES --viz_name dsprites

Train GZS-Net on your own dataset

To use our GZS-Net on you own dataset, before training, please refer the admissible dataset description in our paper. Note: The high level training strategy of the 4 dataset that paper used (Fonts, iLab-20M, RaFD, dSprites) is shown in Figure.3 in our paper. However, to make our method more general and compatale with more dataset, we propose a easier way to train our GZS-Net, we called 'sample edge strategy' to achieve 'One-Overlap Attribute Swap': In each training step, we sample n different edges (each edge corresponding to a specific attribute), and we release the two requirement of edge sample: (1) the two samples connected by an edge with attribute A should have same attribute A value but do not need to have different attribute values of other attributes (e.g. attribute B and C value can be the same). (2) we do not need center image x to keep showing in all edges, which means the connected images between edges can be totally different.

We train ilab-20M with the new training strategy and you can cgange our example code of ilab_20M_custom to your custom dataset.

  • Take ilab_20M_custom dataset as an example
python3 train.py  --dataset ilab_20M_custom --dataset_path YOUR_LOCAL_PATH_OF_CUSTOM_DATASET --viz_name ilab_20M_custom

Citation

If you use this code for your research, please cite our papers.

@inproceedings{ge2021zeroshot,
  title={Zero-shot Synthesis with Group-Supervised Learning},
  author={Yunhao Ge and Sami Abu-El-Haija and Gan Xin and Laurent Itti},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=8wqCDnBmnrT}
}

Acknowledgments

Our code is inspired by Beta-VAE.

Owner
Andy_Ge
Ph.D. Student in Computer Vision, Machine Learning, and Baby Learning
Andy_Ge
The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

Human Trajectory Prediction via Counterfactual Analysis (CausalHTP) The official PyTorch code implementation of "Human Trajectory Prediction via Count

46 Dec 03, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Franka Emika Panda manipulator kinematics&dynamics simulation

pybullet_sim_panda Pybullet simulation environment for Franka Emika Panda Dependency pybullet, numpy, spatial_math_mini Simple example (please check s

0 Jan 20, 2022
[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

OcCo: Unsupervised Point Cloud Pre-training via Occlusion Completion This repository is the official implementation of paper: "Unsupervised Point Clou

Hanchen 204 Dec 24, 2022
OCRA (Object-Centric Recurrent Attention) source code

OCRA (Object-Centric Recurrent Attention) source code Hossein Adeli and Seoyoung Ahn Please cite this article if you find this repository useful: For

Hossein Adeli 2 Jun 18, 2022
Spectrum is an AI that uses machine learning to generate Rap song lyrics

Spectrum Spectrum is an AI that uses deep learning to generate rap song lyrics. View Demo Report Bug Request Feature Open In Colab About The Project S

39 Dec 16, 2022
[SIGGRAPH 2020] Attribute2Font: Creating Fonts You Want From Attributes

Attr2Font Introduction This is the official PyTorch implementation of the Attribute2Font: Creating Fonts You Want From Attributes. Paper: arXiv | Rese

Yue Gao 200 Dec 15, 2022
《Truly shift-invariant convolutional neural networks》(2021)

Truly shift-invariant convolutional neural networks [Paper] Authors: Anadi Chaman and Ivan Dokmanić Convolutional neural networks were always assumed

Anadi Chaman 46 Dec 19, 2022
MohammadReza Sharifi 27 Dec 13, 2022
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022
Official PyTorch implementation of Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval.

Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval PyTorch This is the PyTorch implementation of Retrieve in Style: Unsupervised Fa

60 Oct 12, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 05, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks Requirements python 0.10+ rdkit 2020.03.3.0 biopython 1.78 openbabel 2.4

Neeraj Kumar 3 Nov 23, 2022
A simple algorithm for extracting tree height in sparse scene from point cloud data.

TREE HEIGHT EXTRACTION IN SPARSE SCENES BASED ON UAV REMOTE SENSING This is the offical python implementation of the paper "Tree Height Extraction in

6 Oct 28, 2022
Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

Easy Few-Shot Learning Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification. This repository is made for you

Sicara 399 Jan 08, 2023
💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

Heidelberg-NLP 17 Nov 07, 2022
A High-Quality Real Time Upscaler for Anime Video

Anime4K Anime4K is a set of open-source, high-quality real-time anime upscaling/denoising algorithms that can be implemented in any programming langua

15.7k Jan 06, 2023
A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Emma 1 Jan 18, 2022
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Sanghyun Jo 150 Nov 14, 2022