DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

Overview

DatasetGAN

This is the official code and data release for:

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

Yuxuan Zhang*, Huan Ling*, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

CVPR'21, Oral [paper] [supplementary] [Project Page]

News

  • Benchmark Challenge - A benchmark with diversed testing images is coming soon -- stay tuned!

  • Generated dataset for downstream tasks is coming soon -- stay tuned!

License

For any code dependency related to Stylegan, the license is under the Creative Commons BY-NC 4.0 license by NVIDIA Corporation. To view a copy of this license, visit LICENSE.

The code of DatasetGAN is released under the MIT license. See LICENSE for additional details.

The dataset of DatasetGAN is released under the Creative Commons BY-NC 4.0 license by NVIDIA Corporation. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicating any changes that you've made.

Requirements

  • Python 3.6 or 3.7 are supported.
  • Pytorch 1.4.0 + is recommended.
  • This code is tested with CUDA 10.2 toolkit and CuDNN 7.5.
  • Please check the python package requirement from requirements.txt, and install using
pip install -r requirements.txt

Download Dataset from google drive and put it in the folder of ./datasetGAN/dataset_release. Please be aware that the dataset of DatasetGAN is released under the Creative Commons BY-NC 4.0 license by NVIDIA Corporation.

Download pretrained checkpoint from Stylegan and convert the tensorflow checkpoint to pytorch. Put checkpoints in the folder of ./datasetGAN/dataset_release/stylegan_pretrain. Please be aware that the any code dependency and checkpoint related to Stylegan, the license is under the Creative Commons BY-NC 4.0 license by NVIDIA Corporation.

Note: a good example of converting stylegan tensorlow checkpoint to pytorch is available this Link.

Training

To reproduce paper DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort:

cd datasetGAN
  1. Run Step1: Interpreter training.
  2. Run Step2: Sampling to generate massive annotation-image dataset.
  3. Run Step3: Train Downstream Task.

1. Interpreter Training

python train_interpreter.py --exp experiments/.json 

Note: Training time for 16 images is around one hour. 160G RAM is required to run 16 images training. One can cache the data returned from prepare_data function to disk but it will increase trianing time due to I/O burden.

Example of annotation schema for Face class. Please refer to paper for other classes.

img

2. Run GAN Sampling

python train_interpreter.py \
--generate_data True --exp experiments/.json  \
--resume [path-to-trained-interpreter in step3] \
--num_sample [num-samples]

To run sampling processes in parallel

sh datasetGAN/script/generate_face_dataset.sh

Example of sampling images and annotation:

img

3. Train Downstream Task

python train_deeplab.py \
--data_path [path-to-generated-dataset in step4] \
--exp experiments/.json

Inference

img

python test_deeplab_cross_validation.py --exp experiments/face_34.json\
--resume [path-to-downstream task checkpoint] --cross_validate True

June 21st Update:

For training interpreter, we change the upsampling method from nearnest upsampling to bilinar upsampling in line and update results in Table 1. The table reports mIOU.

Citations

Please ue the following citation if you use our data or code:

@inproceedings{zhang2021datasetgan,
  title={Datasetgan: Efficient labeled data factory with minimal human effort},
  author={Zhang, Yuxuan and Ling, Huan and Gao, Jun and Yin, Kangxue and Lafleche, Jean-Francois and Barriuso, Adela and Torralba, Antonio and Fidler, Sanja},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10145--10155},
  year={2021}
}
A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

SelfGNN A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling" paper, which will appear in Th

Zekarias Tilahun 24 Jun 21, 2022
Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

diffusion-channels Source code for paper "Deep Diffusion Models for Robust Channel Estimation". Generic flow: Use 'matlab/main.mat' to generate traini

The University of Texas Computational Sensing and Imaging Lab 15 Dec 22, 2022
Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3

Python-for-Epidemiologists This repository is an introduction to epidemiology analyses in Python. Additionally, the tutorials for my library zEpid are

Paul Zivich 120 Nov 17, 2022
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
Naszilla is a Python library for neural architecture search (NAS)

A repository to compare many popular NAS algorithms seamlessly across three popular benchmarks (NASBench 101, 201, and 301). You can implement your ow

270 Jan 03, 2023
Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy Paper. Pr

Yu Rong 41 Dec 13, 2022
The repo of Feedback Networks, CVPR17

Feedback Networks http://feedbacknet.stanford.edu/ Paper: Feedback Networks, CVPR 2017. Amir R. Zamir*,Te-Lin Wu*, Lin Sun, William B. Shen, Bertram E

Stanford Vision and Learning Lab 87 Nov 19, 2022
Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

SuperSuit introduces a collection of small functions which can wrap reinforcement learning environments to do preprocessing ('microwrappers'). We supp

Farama Foundation 357 Jan 06, 2023
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

AttentionHTR PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks. Scene Text

Dmitrijs Kass 31 Dec 22, 2022
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 49 Nov 05, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License 🎓 Introduction REval is a simple framework for

13 Jan 06, 2023
YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

YOLOX CondInst -- YOLOX 实例分割 前言 本项目是自己学习实例分割时,复现的代码. 通过自己编程,让自己对实例分割有更进一步的了解。 若想

DDGRCF 16 Nov 18, 2022
Classify the disease status of a plant given an image of a passion fruit

Passion Fruit Disease Detection I tried to create an accurate machine learning models capable of localizing and identifying multiple Passion Fruits in

3 Nov 09, 2021
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
Shōgun

The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat

Shōgun ML 2.9k Jan 04, 2023
This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

Edward Turner 1 Dec 02, 2021
State-to-Distribution (STD) Model

State-to-Distribution (STD) Model In this repository we provide exemplary code on how to construct and evaluate a state-to-distribution (STD) model fo

<a href=[email protected]"> 2 Apr 07, 2022
Bu repo SAHI uygulamasını mantığını öğreniyoruz.

SAHI-Learn: SAHI'den Beraber Kodlamak İster Misiniz Herkese merhabalar ben Kadir Nar. SAHI kütüphanesine gönüllü geliştiriciyim. Bu repo SAHI kütüphan

Kadir Nar 11 Aug 22, 2022
Code for "Learning Graph Cellular Automata"

Learning Graph Cellular Automata This code implements the experiments from the NeurIPS 2021 paper: "Learning Graph Cellular Automata" Daniele Grattaro

Daniele Grattarola 37 Oct 26, 2022