StarGAN2 for practice

Last update: Sep 24, 2022

Overview

StarGAN2 for practice

This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. At least, this is what I use nearly daily myself.
Here are few pieces, made with it: Terminal Blink, Occurro, etc.
Tested on Pytorch 1.4-1.8. Sequence-to-video conversions require FFMPEG. For more explicit details refer to the original implementation.

Features

streamlined workflow, focused on practical tasks [TBA]
cleaned up and simplified code for better readability
stricter memory management to fit bigger batches on consumer GPUs
models mixing (SWA) for better stability

NB: In the meantime here's only training code and some basic inference (processing). More various methods & use cases may be added later.

Presumed file structure

stargan2	root
├ _in	input data for processing
├ _out	generation output (sequences & videos)
├ data	datasets for training
│ └ afhq	[example] some dataset
│ ├ cats	[example] images for training
│ │ └ test	[example] images for validation
│ ├ dogs	[example] images for training
│ │ └ test	[example] images for validation
│ └ ⋯
├ models	trained models for inference/processing
│ └ afhq-256-5-100.pkl	[example] trained model file
├ src	source code
└ train	training folders
└ afhq..	[example] auto-created training folder

Training

Prepare your multi-domain dataset as shown above. Main directory should contain folders with images of different domains (e.g. cats, dogs, ..); every such folder must contain test subfolder with validation subset. Such structure allows easy data recombination for experiments. The images may be of any sizes (they'll be randomly cropped during training), but not smaller than img_size specified for training (default is 256).
Train StarGAN2 on the prepared dataset (e.g. afhq):

 python src/train.py --data_dir data/afhq --model_dir train/afhq --img_size 256 --batch 8

This will run training process, according to the settings in src/train.py (check and explore those!). Models are saved under train/afhq and named as dataset-size-domaincount-kimgs, e.g. afhq-256-5-100.ckpt (required for resuming).

Resume training on the same dataset from the iteration 50 (thousands), presuming there's corresponding complete 3-models set (with nets and optims) in train/afhq:

 python src/train.py --data_dir data/afhq --model_dir train/afhq --img_size 256 --batch 8 --resume 50

Make an averaged model (only for generation) from the directory of those, e.g. train/select:

 python src/swa.py -i train/select

Few personal findings

Batch size is crucial for this network! Official settings are batch=8 for size 256, if you have large GPU RAM. One can fit batch 3 or 4 on 11gb GPU; those results are interesting, but less impressive. Batches of 2 or 1 are for the brave only.. Size is better kept as 256; the network has auto-scaling layer count, but I didn't manage to get comparable results for size 512 with batches up to 7 (max for 32gb).
Model weights may seriously oscillate during training, especially for small batches (typical for Cycle- or Star- GANs), so it's better to save models frequently (there may be jewels). The best selected models can be mixed together with swa.py script for better stability. By default, Generator network is saved every 1000 iterations, and the full set - every 5000 iterations. 100k iterations (few days on a single GPU) may be enough; 200-250k would give pretty nice overfit.
Lambda coefficients lambda_ds (diversity), lambda_cyc (reconstruction) and lambda_sty (style) may be increased for smaller batches, especially if the goal is stylization, rather than photo-realistic transformation. The videos above, for instance, were made with these lambdas equal 3. The reference-based generation is nearly lost with such settings, but latent-based one can make nice art.
The order of domains in the training set matters a lot! I usually put some photos first (as it will be the main source imagery), and the closest to photoreal as second; but other approaches may go well too (and your mileage may vary).
I particularly love this network for its' failures. Even the flawed results (when the batches are small, the lambdas are wrong, etc.) are usually highly expressive and "inventive", just the kind of "AI own art", which is so spoken about. Experimenting with such aesthetics is a great fun.

Generation

Transform image test.jpg with AFHQ model (can be downloaded here):

python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt

This will produce 3 images (one per trained domain in the model) in the _out directory.
If source is a directory, every image in it will be processed accordingly.

Generate output for the domain(s), referenced by number(s):

python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt --ref 2

Generate output with reference image for domain 1 (ref filename must start with that number):

python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt --ref 1-ref.jpg

To be continued..

StarGAN2 for practice

Related tags

Overview

StarGAN2 for practice

Features

Presumed file structure

Training

Few personal findings

Generation

Credits

Owner

vadim epstein

ParaGen is a PyTorch deep learning framework for parallel sequence generation

A annotation of yolov5-5.0

Semantic Segmentation in Pytorch

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

CN24 is a complete semantic segmentation framework using fully convolutional networks

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Implementation of the pix2pix model on satellite images

验证码识别深度学习 tensorflow 神经网络

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

RANZCR-CLiP 7th Place Solution

Implementation of SwinTransformerV2 in TensorFlow.

A very impractical 3D rendering engine that runs in the python terminal.

Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

Async API for controlling Hue Lights

Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Deep Reinforcement Learning for Keras.

This library provides an abstraction to perform Model Versioning using Weight & Biases.

The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

StarGAN2 for practice

Related tags

Overview

StarGAN2 for practice

Features

Presumed file structure

Training

Few personal findings

Generation

Credits

Owner

vadim epstein

ParaGen is a PyTorch deep learning framework for parallel sequence generation

A annotation of yolov5-5.0

Semantic Segmentation in Pytorch

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

CN24 is a complete semantic segmentation framework using fully convolutional networks

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Implementation of the pix2pix model on satellite images

验证码识别 深度学习 tensorflow 神经网络

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

RANZCR-CLiP 7th Place Solution

Implementation of SwinTransformerV2 in TensorFlow.

A very impractical 3D rendering engine that runs in the python terminal.

Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

Async API for controlling Hue Lights

Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Deep Reinforcement Learning for Keras.

This library provides an abstraction to perform Model Versioning using Weight & Biases.

The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

验证码识别深度学习 tensorflow 神经网络