StarGAN2 for practice

Overview

StarGAN2 for practice

This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. At least, this is what I use nearly daily myself.
Here are few pieces, made with it: Terminal Blink, Occurro, etc.
Tested on Pytorch 1.4-1.8. Sequence-to-video conversions require FFMPEG. For more explicit details refer to the original implementation.

Features

  • streamlined workflow, focused on practical tasks [TBA]
  • cleaned up and simplified code for better readability
  • stricter memory management to fit bigger batches on consumer GPUs
  • models mixing (SWA) for better stability

NB: In the meantime here's only training code and some basic inference (processing). More various methods & use cases may be added later.

Presumed file structure

stargan2 root
├  _in input data for processing
├  _out generation output (sequences & videos)
├  data datasets for training
│  └  afhq [example] some dataset
│     ├  cats [example] images for training
│     │  └  test [example] images for validation
│     ├  dogs [example] images for training
│     │  └  test [example] images for validation
│     └  ⋯
├  models trained models for inference/processing
│  └  afhq-256-5-100.pkl [example] trained model file
├  src source code
└  train training folders
   └  afhq.. [example] auto-created training folder

Training

  • Prepare your multi-domain dataset as shown above. Main directory should contain folders with images of different domains (e.g. cats, dogs, ..); every such folder must contain test subfolder with validation subset. Such structure allows easy data recombination for experiments. The images may be of any sizes (they'll be randomly cropped during training), but not smaller than img_size specified for training (default is 256).

  • Train StarGAN2 on the prepared dataset (e.g. afhq):

 python src/train.py --data_dir data/afhq --model_dir train/afhq --img_size 256 --batch 8

This will run training process, according to the settings in src/train.py (check and explore those!). Models are saved under train/afhq and named as dataset-size-domaincount-kimgs, e.g. afhq-256-5-100.ckpt (required for resuming).

  • Resume training on the same dataset from the iteration 50 (thousands), presuming there's corresponding complete 3-models set (with nets and optims) in train/afhq:
 python src/train.py --data_dir data/afhq --model_dir train/afhq --img_size 256 --batch 8 --resume 50
  • Make an averaged model (only for generation) from the directory of those, e.g. train/select:
 python src/swa.py -i train/select 

Few personal findings

  1. Batch size is crucial for this network! Official settings are batch=8 for size 256, if you have large GPU RAM. One can fit batch 3 or 4 on 11gb GPU; those results are interesting, but less impressive. Batches of 2 or 1 are for the brave only.. Size is better kept as 256; the network has auto-scaling layer count, but I didn't manage to get comparable results for size 512 with batches up to 7 (max for 32gb).
  2. Model weights may seriously oscillate during training, especially for small batches (typical for Cycle- or Star- GANs), so it's better to save models frequently (there may be jewels). The best selected models can be mixed together with swa.py script for better stability. By default, Generator network is saved every 1000 iterations, and the full set - every 5000 iterations. 100k iterations (few days on a single GPU) may be enough; 200-250k would give pretty nice overfit.
  3. Lambda coefficients lambda_ds (diversity), lambda_cyc (reconstruction) and lambda_sty (style) may be increased for smaller batches, especially if the goal is stylization, rather than photo-realistic transformation. The videos above, for instance, were made with these lambdas equal 3. The reference-based generation is nearly lost with such settings, but latent-based one can make nice art.
  4. The order of domains in the training set matters a lot! I usually put some photos first (as it will be the main source imagery), and the closest to photoreal as second; but other approaches may go well too (and your mileage may vary).
  5. I particularly love this network for its' failures. Even the flawed results (when the batches are small, the lambdas are wrong, etc.) are usually highly expressive and "inventive", just the kind of "AI own art", which is so spoken about. Experimenting with such aesthetics is a great fun.

Generation

  • Transform image test.jpg with AFHQ model (can be downloaded here):
python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt

This will produce 3 images (one per trained domain in the model) in the _out directory.
If source is a directory, every image in it will be processed accordingly.

  • Generate output for the domain(s), referenced by number(s):
python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt --ref 2
  • Generate output with reference image for domain 1 (ref filename must start with that number):
python src/test.py --source test.jpg --model models/100000_nets_ema.ckpt --ref 1-ref.jpg

To be continued..

Credits

StarGAN2
Copyright © 2020, NAVER Corp. All rights reserved.
Made available under Creative Commons BY-NC 4.0 license.
Original paper: https://arxiv.org/abs/1912.01865

Owner
vadim epstein
vadim epstein
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 04, 2022
Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Movement Primitives Movement primitives are a common group of policy representations in robotics. There are many different types and variations. This

DFKI Robotics Innovation Center 63 Jan 06, 2023
yolov5目标检测模型的知识蒸馏(基于响应的蒸馏)

代码地址: https://github.com/Sharpiless/yolov5-knowledge-distillation 教师模型: python train.py --weights weights/yolov5m.pt \ --cfg models/yolov5m.ya

52 Dec 04, 2022
Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Weighted Projective Spaces ML Description: The database of 5-vectors describing 4d weighted projective spaces which admit Calabi-Yau hypersurfaces are

Ed Hirst 3 Sep 08, 2022
Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation.

Pretrain-Recsys This is our Tensorflow implementation for our WSDM 2021 paper: Bowen Hao, Jing Zhang, Hongzhi Yin, Cuiping Li, Hong Chen. Pre-Training

30 Nov 14, 2022
【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal

From Rain Generation to Rain Removal (CVPR2021) Hong Wang, Zongsheng Yue, Qi Xie, Qian Zhao, Yefeng Zheng, and Deyu Meng [PDF&&Supplementary Material]

Hong Wang 48 Nov 23, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
code for Grapadora research paper experimentation

Road feature embedding selection method Code for research paper experimentation Abstract Traffic forecasting models rely on data that needs to be sens

Eric López Manibardo 0 May 26, 2022
PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

Mingu Kang 17 Dec 13, 2022
Implementation for Simple Spectral Graph Convolution in ICLR 2021

Simple Spectral Graph Convolutional Overview This repo contains an example implementation of the Simple Spectral Graph Convolutional (S^2GC) model. Th

allenhaozhu 64 Dec 31, 2022
A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022
COVID-Net Open Source Initiative

The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available

Linda Wang 1.1k Dec 26, 2022
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 47 Sep 06, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023
This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

SILG This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark. If you find this work helpful, please cons

Victor Zhong 17 Nov 27, 2022
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Score Transformer This is the official repository for "Score Transformer": Score Transformer: Generating Musical Scores from Note-level Representation

22 Dec 22, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Sibur challange 2021 competition - 6 place

sibur challange 2021 Решение на 6 место: https://sibur.ai-community.com/competitions/5/tasks/13 Скор 1.4066/1.4159 public/private. Архитектура - однос

Ivan 5 Jan 11, 2022