StyleGAN2-ada for practice

Overview

StyleGAN2-ada for practice

Open In Colab

This version of the newest PyTorch-based StyleGAN2-ada is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. Tested on Python 3.7 + PyTorch 1.7.1, requires FFMPEG for sequence-to-video conversions. For more explicit details refer to the original implementations.

Here is previous Tensorflow-based version, which produces compatible models (but not vice versa).
I still prefer it for few-shot training (~100 imgs), and for model surgery tricks (not ported here yet).

Features

  • inference (image generation) in arbitrary resolution (finally with proper padding on both TF and Torch)
  • multi-latent inference with split-frame or masked blending
  • non-square aspect ratio support (auto-picked from dataset; resolution must be divisible by 2**n, such as 512x256, 1280x768, etc.)
  • transparency (alpha channel) support (auto-picked from dataset)
  • using plain image subfolders as conditional datasets
  • funky "digression" inference technique, ported from Aydao

Few operation formats ::

  • Windows batch-files, described below (if you're on Windows with powerful GPU)
  • local Jupyter notebook (for non-Windows platforms)
  • Colab notebook (max ease of use, requires Google drive)

Just in case, original StyleGAN2-ada charms:

  • claimed to be up to 30% faster than original StyleGAN2
  • has greatly improved training (requires 10+ times fewer samples)
  • has lots of adjustable internal training settings
  • works with plain image folders or zip archives (instead of custom datasets)
  • should be easier to tweak/debug

Training

  • Put your images in data as subfolder or zip archive. Ensure they all have the same color channels (monochrome, RGB or RGBA).
    If needed, first crop square fragments from source video or directory with images (feasible method, if you work with patterns or shapes, rather than compostions):
 multicrop.bat source 512 256 

This will cut every source image (or video frame) into 512x512px fragments, overlapped with 256px shift by X and Y. Result will be in directory source-sub, rename it as you wish. If you edit the images yourself (e.g. for non-square aspect ratios), ensure their correct size. For conditional model split the data by subfolders (mydata/1, mydata/2, ..).

  • Train StyleGAN2-ada on the prepared dataset (image folder or zip archive):
 train.bat mydata

This will run training process, according to the settings in src/train.py (check and explore those!!). Results (models and samples) are saved under train directory, similar to original Nvidia approach. For conditional model add --cond option.

Please note: we save both compact models (containing only Gs network for inference) as -...pkl (e.g. mydata-512-0360.pkl), and full models (containing G/D/Gs networks for further training) as snapshot-...pkl. The naming is for convenience only.

Length of the training is defined by --lod_kimg X argument (training duration per layer/LOD). Network with base resolution 1024px will be trained for 20 such steps, for 512px - 18 steps, et cetera. Reasonable lod_kimg value for full training from scratch is 300-600, while for finetuning 20-40 is sufficient. One can override this approach, setting total duration directly with --kimg X.

If you have troubles with custom cuda ops, try removing their cached version (C:\Users\eps\AppData\Local\torch_extensions on Windows).

  • Resume training on mydata dataset from the last saved model at train/000-mydata-512-.. directory:
 train_resume.bat mydata 000-mydata-512-..
  • Uptrain (finetune) well-trained model ffhq-512.pkl on new data:
 train_resume.bat newdata ffhq-512.pkl

No need to count exact steps in this case, just stop when you're ok with the results (it's better to set low lod_kimg to follow the progress).

Generation

Generated results are saved as sequences and videos (by default, under _out directory).

  • Test the model in its native resolution:
 gen.bat ffhq-1024.pkl
  • Generate custom animation between random latent points (in z space):
 gen.bat ffhq-1024 1920-1080 100-20

This will load ffhq-1024.pkl from models directory and make a 1920x1080 px looped video of 100 frames, with interpolation step of 20 frames between keypoints. Please note: omitting .pkl extension would load custom network, effectively enabling arbitrary resolution, multi-latent blending, etc. Using filename with extension will load original network from PKL (useful to test foreign downloaded models). There are --cubic and --gauss options for animation smoothing, and few --scale_type choices. Add --save_lat option to save all traversed dlatent w points as Numpy array in *.npy file (useful for further curating).

  • Generate more various imagery:
 gen.bat ffhq-1024 3072-1024 100-20 -n 3-1

This will produce animated composition of 3 independent frames, blended together horizontally (similar to the image in the repo header). Argument --splitfine X controls boundary fineness (0 = smoothest).

Instead of simple frame splitting, one can load external mask(s) from b/w image file (or folder with file sequence):

 gen.bat ffhq-1024 1024-1024 100-20 --latmask _in/mask.jpg

Arguments --digress X would add some animated funky displacements with X strength (by tweaking initial const layer params). Arguments --trunc X controls truncation psi parameter, as usual.

NB: Windows batch-files support only 9 command arguments; if you need more options, you have to edit batch-file itself.

  • Project external images onto StyleGAN2 model dlatent points (in w space):
 project.bat ffhq-1024.pkl photo

The result (found dlatent points as Numpy arrays in *.npy files, and video/still previews) will be saved to _out/proj directory.

  • Generate smooth animation between saved dlatent points (in w space):
 play_dlatents.bat ffhq-1024 dlats 25 1920-1080

This will load saved dlatent points from _in/dlats and produce a smooth looped animation between them (with resolution 1920x1080 and interpolation step of 25 frames). dlats may be a file or a directory with *.npy or *.npz files. To select only few frames from a sequence somename.npy, create text file with comma-delimited frame numbers and save it as somename.txt in the same directory (check examples for FFHQ model). You can also "style" the result: setting --style_dlat blonde458.npy will load dlatent from blonde458.npy and apply it to higher layers, producing some visual similarity. --cubic smoothing and --digress X displacements are also applicable here.

  • Generate animation from saved point and feature directions (say, aging/smiling/etc for FFHQ model) in dlatent w space:
 play_vectors.bat ffhq-1024.pkl blonde458.npy vectors_ffhq

This will load base dlatent point from _in/blonde458.npy and move it along direction vectors from _in/vectors_ffhq, one by one. Result is saved as looped video.

Credits

StyleGAN2: Copyright © 2021, NVIDIA Corporation. All rights reserved.
Made available under the Nvidia Source Code License-NC
Original paper: https://arxiv.org/abs/2006.06676

Owner
vadim epstein
vadim epstein
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Auto-Seg-Loss By Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai This is the official implementation of the ICLR 2021 paper Auto

61 Dec 21, 2022
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation This repository contains the source code of our paper, ESPNet (acc

Sachin Mehta 515 Dec 13, 2022
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

Arthur Douillard 116 Dec 14, 2022
Hand gesture recognition model that can be used as a remote control for a smart tv.

Gesture_recognition The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds lon

Pratyush Negi 1 Aug 11, 2022
Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022
Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

Boostcamp AI Tech 3rd : Basic Paper Reading w.r.t Embedding TL;DR 1992년부터 2018년도까지 이루어진 word/sentence embedding의 중요한 줄기를 이루는 기초 논문 스터디를 진행하고자 합니다. 논

Soyeon Kim 14 Nov 14, 2022
CSPML (crystal structure prediction with machine learning-based element substitution)

CSPML (crystal structure prediction with machine learning-based element substitution) CSPML is a unique methodology for the crystal structure predicti

8 Dec 20, 2022
This code implements constituency parse tree aggregation

README This code implements constituency parse tree aggregation. Folder details code: This folder contains the code that implements constituency parse

Adithya Kulkarni 0 Oct 11, 2021
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 51 Dec 27, 2022
General Vision Benchmark, a project from OpenGVLab

Introduction We build GV-B(General Vision Benchmark) on Classification, Detection, Segmentation and Depth Estimation including 26 datasets for model e

174 Dec 27, 2022
DiffStride: Learning strides in convolutional neural networks

DiffStride is a pooling layer with learnable strides. Unlike strided convolutions, average pooling or max-pooling that require cross-validating stride values at each layer, DiffStride can be initiali

Google Research 113 Dec 13, 2022
MvtecAD unsupervised Anomaly Detection

MvtecAD unsupervised Anomaly Detection This respository is the unofficial implementations of DFR: Deep Feature Reconstruction for Unsupervised Anomaly

0 Feb 25, 2022
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

SG-GAN TensorFlow implementation of SG-GAN. Prerequisites TensorFlow (implemented in v1.3) numpy scipy pillow Getting Started Train Prepare dataset. W

lplcor 61 Jun 07, 2022
A light weight data augmentation tool for training CNNs and Viola Jones detectors

hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six

Jaiyam Sharma 2 Nov 23, 2019
This is a library for training and applying sparse fine-tunings with torch and transformers.

This is a library for training and applying sparse fine-tunings with torch and transformers. Please refer to our paper Composable Sparse Fine-Tuning f

Cambridge Language Technology Lab 37 Dec 30, 2022
This program can detect your face and add an Christams hat on the top of your head

Auto_Christmas This program can detect your face and add a Christmas hat to the top of your head. just run the Auto_Christmas.py, then you can see the

3 Dec 22, 2021
Script for getting information in discord

User-info.py Script for getting information in https://discord.com/ Instalação: apt-get update -y apt-get upgrade -y apt-get install git pkg install

Moleey 1 Dec 18, 2021
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022