Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Last update: Dec 28, 2022

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations

`Curve`	`Distort`	`Stretch`

geometry.py - to generate Perspective, Rotation, Shrink deformations

`Perspective`	`Rotation`	`Shrink`

pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid

`Grid`	`VGrid`	`HGrid`	`RectGrid`	`EllipseGrid`

blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur

`GaussianBlur`	`DefocusBlur`	`MotionBlur`	`GlassBlur`	`ZoomBlur`

noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise

`GaussianNoise`	`ShotNoise`	`ImpulseNoise`	`SpeckleNoise`

weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow

`Fog`	`Snow`	`Frost`	`Rain`	`Shadow`

camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate

`Contrast`	`Brightness`	`JpegCompression`	`Pixelate`

process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color

`Posterize`	`Solarize`	`Invert`	`Equalize`

`AutoContrast`	`Sharpness`	`Color`

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Related tags

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

Paper

Why it matters?

Pip install

How to use

Reference

Citation

Owner

Rowel Atienza

NLP made easy

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

4th place solution for the SIGIR 2021 challenge.

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

⚾🤖⚾ Automatic baseball pitching overlay in realtime

Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

Pytorch implementation of set transformer

Bunch of different tools which helps visualizing and annotating images for semantic/instance segmentation tasks

Illuminated3D This project participates in the Nasa Space Apps Challenge 2021.

Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

This is the repository for Learning to Generate Piano Music With Sustain Pedals

Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

End-to-end beat and downbeat tracking in the time domain.

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf