Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Last update: Aug 09, 2022

Related tags

Overview

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Code for the paper Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning (TMM 2021).

Introduction

Automatic typography is important because it helps designers avoid highly repetitive tasks and amateur users achieve high-quality textual layout designs. However, there are often many parameters and complicated aesthetic rules that need to be adjusted in automatic typography work. In this paper, we propose an efficient deep aesthetics learning approach to generate harmonious textual layout over natural images, which can be decomposed into two stages, saliency-aware text region proposal and aesthetics-based textual layout selection. Our method incorporates both semantic features and visual perception principles. First, we propose a semantic visual saliency detection network combined with a text region proposal algorithm to generate candidate text anchors with various positions and sizes. Second, a discriminative deep aesthetics scoring model is developed to assess the aesthetic quality of the candidate textual layouts. The results demonstrate that our method can generate harmonious textual layouts in various actual scenarios with better performance.

Dependencies and Installation

Python 3
PyTorch >= 1.0

Notes of compilation

For Python3 users, before you start to build the source code and install the packages, please specify the architecture of your GPU card and CUDA_HOME path in both ./roi_align/make.sh and ./rod_align/make.sh
Build and install by running:
```
bash make_all.sh
```

Usage

Download the source code and the pretrained models: gdi-basnet and SMT.
Make sure your device is CUDA enabled. Build and install source code of roi_align_api and rod_align_api.
Run SmartText_demo.py to test the pretrained model on your images.
```
python SmartText_demo.py -opt test_opt.yml
```

Acknowledgement

This work is the extension of our conference version (ICME 2020). Some codes of this repository benefit from BASNet and GAIC. Thanks for their excellent work!

Citation

If you find this work useful, please cite our paper:

@article{li2021harmonious,
    title     = {Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning},
    author    = {Li, Chenhui and Zhang, Peiying and Wang, Changbo},
    journal   = {IEEE Transactions on Multimedia},
    year      = {2021},
    publisher = {IEEE}
}

Contact

If you have any question, contact us through email at [email protected].

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Related tags

Overview

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Introduction

Dependencies and Installation

Notes of compilation

Usage

Acknowledgement

Citation

Contact

Owner

This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters.

Neighborhood Contrastive Learning for Novel Class Discovery

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

This repo provides function call to track multi-objects in videos

adversarial_multi_armed_bandit_variable_plays

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

The hippynn python package - a modular library for atomistic machine learning with pytorch.

Simple and Distributed Machine Learning

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

An Active Automata Learning Library Written in Python

Nightmare-Writeup - Writeup for the Nightmare CTF Challenge from 2022 DiceCTF

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

Official Implementation of VAT

My implementation of transformers related papers for computer vision in pytorch

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))