Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

Related tags

Deep LearningMC-GAN
Overview

MC-GAN in PyTorch

This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you use this code or our collected font dataset for your research, please cite:

Multi-Content GAN for Few-Shot Font Style Transfer; Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell, in arXiv, 2017.

Prerequisites:

  • Linux or macOS
  • Python 2.7
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Install PyTorch and dependencies from http://pytorch.org
  • Install Torch vision from the source.
git clone https://github.com/pytorch/vision
cd vision
python setup.py install
pip install visdom
pip install dominate
pip install scikit-image
  • Clone this repo:
mkdir FontTransfer
cd FontTransfer
git clone https://github.com/azadis/MC-GAN
cd MC-GAN

MC-GAN train/test

  • Download our gray-scale 10K font data set:

./datasets/download_font_dataset.sh Capitals64

../datasets/Capitals64/test_dict/dict.pkl makes observed random glyphs be similar at different test runs on Capitals64 dataset. It is a dictionary with font names as keys and random arrays containing indices from 0 to 26 as their values. Lengths of the arrays are equal to the number of non-observed glyphs in each font.

../datasets/Capitals64/BASE/Code New Roman.0.0.png is a fixed simple font used for training the conditional GAN in the End-to-End model.

./datasets/download_font_dataset.sh public_web_fonts

Given a few letters of font ${DATA} for examples 5 letters {T,O,W,E,R}, training directory ${DATA}/A should contain 5 images each with dimension 64x(64x26)x3 where 5 - 1 = 4 letters are given and the rest are zeroed out. Each image should be saved as ${DATA}_${IND}.png where ${IND} is the index (in [0,26) ) of the letter omitted from the observed set. Training directory ${DATA}/B contains images each with dimension 64x64x3 where only the omitted letter is given. Image names are similar to the ones in ${DATA}/A though. ${DATA}/A/test/${DATA}.png contains all 5 given letters as a 64x(64x26)x3-dimensional image. Structure of the directories for above real-world fonts (including only a few observed letters) is as follows. One can refer to the examples in ../datasets/public_web_fonts for more information.

../datasets/public_web_fonts
                      └── ${DATA}/
                          ├── A/
                          │  ├──train/${DATA}_${IND}.png
                          │  └──test/${DATA}.png
                          └── B/
                             ├──train/${DATA}_${IND}.png
                             └──test/${DATA}.png
  • (Optional) Download our synthetic color gradient font data set:

./datasets/download_font_dataset.sh Capitals_colorGrad64
  • Train Glyph Network:
./scripts/train_cGAN.sh Capitals64

Model parameters will be saved under ./checkpoints/GlyphNet_pretrain.

  • Test Glyph Network after specific numbers of epochs (e.g. 400 by setting EPOCH=400 in ./scripts/test_cGAN.sh):
./scripts/test_cGAN.sh Capitals64
  • (Optional) View the generated images (e.g. after 400 epochs):
cd ./results/GlyphNet_pretrain/test_400/

If you are running the code in your local machine, open index.html. If you are running remotely via ssh, on your remote machine run:

python -m SimpleHTTPServer 8881

Then on your local machine, start an SSH tunnel: ssh -N -f -L localhost:8881:localhost:8881 [email protected]_host Now open your browser on the local machine and type in the address bar:

localhost:8881
  • (Optional) Plot loss functions values during training, from MC-GAN directory:
python util/plot_loss.py --logRoot ./checkpoints/GlyphNet_pretrain/
  • Train End-to-End network (e.g. on DATA=ft37_1): You can train Glyph Network following instructions above or download our pre-trained model by running:
./pretrained_models/download_cGAN_models.sh

Now, you can train the full model:

./scripts/train_StackGAN.sh ${DATA}
  • Test End-to-End network:
./scripts/test_StackGAN.sh ${DATA}

results will be saved under ./results/${DATA}_MCGAN_train.

  • (Optional) Make a video from your results in different training epochs:

First, train your model and save model weights in every epoch by setting opt.save_epoch_freq=1 in scripts/train_StackGAN.sh. Then test in different epochs and make the video by:

./scripts/make_video.sh ${DATA}

Follow the previous steps to visualize generated images and training curves where you replace GlyphNet_train with ${DATA}_StackGAN_train.

Training/test Details

  • Flags: see options/train_options.py, options/base_options.py and options/test_options.py for explanations on each flag.

  • Baselines: if you want to use this code to get results of Image Translation baseline or want to try tiling glyphs rather than stacking, refer to the end of scripts/train_cGAN.sh . If you only want to train OrnaNet on top of clean glyphs, refer to the end of scripts/train_StackGAN.sh.

  • Image Dimension: We have tried this network only on 64x64 images of letters. We do not scale and crop images since we set both opt.FineSize and opt.LoadSize to 64.

Citation

If you use this code or the provided dataset for your research, please cite our paper:

@inproceedings{azadi2018multi,
  title={Multi-content gan for few-shot font style transfer},
  author={Azadi, Samaneh and Fisher, Matthew and Kim, Vladimir and Wang, Zhaowen and Shechtman, Eli and Darrell, Trevor},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  volume={11},
  pages={13},
  year={2018}
}

Acknowledgements

We thank Elena Sizikova for downloading all fonts used in the 10K font data set.

Code is inspired by pytorch-CycleGAN-and-pix2pix.

Owner
Samaneh Azadi
CS PhD student at UC Berkeley
Samaneh Azadi
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group - email: <a href=[email protected]"> 377 Jan 07, 2023
Selfplay In MultiPlayer Environments

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

200 Jan 08, 2023
The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

EMANet News The bug in loading the pretrained model is now fixed. I have updated the .pth. To use it, download it again. EMANet-101 gets 80.99 on the

Xia Li 李夏 663 Nov 30, 2022
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022
Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

Vis'Yerres SGDF - Modules Woob Vous avez le sentiment que l'intranet des Scouts

Thomas Touhey (pas un pseudonyme) 3 Dec 24, 2022
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 192 Dec 20, 2022
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 09, 2022
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022
Repository sharing code and the model for the paper "Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes"

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes Setup virtualenv -p python3 venv source venv/bin/activate pip instal

Planet AI GmbH 9 May 20, 2022
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

ICNet_tensorflow This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images,"

HsuanKung Yang 406 Nov 27, 2022
Material related to the Principles of Cloud Computing course.

CloudComputingCourse Material related to the Principles of Cloud Computing course. This repository comprises material that I use to teach my Principle

Aniruddha Gokhale 15 Dec 02, 2022
Exponential Graph is Provably Efficient for Decentralized Deep Training

Exponential Graph is Provably Efficient for Decentralized Deep Training This code repository is for the paper Exponential Graph is Provably Efficient

3 Apr 20, 2022
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 05, 2023
This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

Biorobotics Lab 5 Oct 12, 2022
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

GT-SALT 36 Dec 02, 2022
Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect"

Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect" by Michael Ne

M Nestor 1 Apr 19, 2022
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features | paper | Official PyTorch implementation for Mul

48 Dec 28, 2022
PromptDet: Expand Your Detector Vocabulary with Uncurated Images

PromptDet: Expand Your Detector Vocabulary with Uncurated Images Paper Website Introduction The goal of this work is to establish a scalable pipeline

103 Dec 20, 2022
The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

UNet-SIDE The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network. For Super Reso

TIANTIAN XU 1 Jan 13, 2022
Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

Vision and Language Group@ MIL 23 Dec 21, 2022