Scene text recognition

Last update: Jan 09, 2023

Related tags

Overview

AttentionOCR for Arbitrary-Shaped Scene Text Recognition

Introduction

This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (Latin Only, Latin and Chinese), futhermore, the algorithm is also adopted in ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling and ICDAR2019 Robust Reading Challenge on Reading Chinese Text on Signboard.

Scene text detection algorithm is modified from Tensorpack FasterRCNN, and we only open source code in this repository for scene text recognition. I upload ICDAR2019 ArT competition model to docker hub, please refer to Docker. For more details, please refer to our arXiv technical report.

Our text recognition algorithm not only recognizes Latin and Non-Latin characters, but also supports horizontal and vertical text recognition in one model. It is convenient for multi-lingual arbitrary-shaped text recognition.

Note that the competition model in docker container as described in our technical report is slightly different from the recognition model trained from this updated repository.

Dependencies

python 3
tensorflow-gpu 1.14
tensorpack 0.9.8
pycocotools

Usage

First download and extract multiple text datasets in base text dir, please refer to dataset.py for dataset preprocess and multiple datasets.

Multiple Datasets

$(base_dir)/lsvt
$(base_dir)/art
$(base_dir)/rects
$(base_dir)/icdar2017rctw

You can also synthesize text recognition data for data augmentation, please refer to TextRecognitionDataGenerator. It is helpful for long text recognition and attention-based language model because you can directly synthesize text images from NLP corpus. Then you should rewrite dataset.py for synthetic text dataset.

$(base_dir)/synthetic_text

Train

First, download pretrained inception v4 checkpoint and put it in ./pretrain folder. Then you can modify your gpu lists in config.py for specified gpus and then run:

python train.py

You can visualize your training steps via tensorboard:

tensorboard --logdir='./checkpoint'

Use ICDAR2019-LSVT, ICDAR2019-ArT, ICDAR2019-ReCTS for default training, you can change it with your own training data.

Evaluation

python eval.py --checkpoint_path=$(Your model path)

Use ICDAR2017RCTW for default evaluation with Normalized Edit Distance metric(1-N.E.D specifically), you can change it with your own evaluation data.

Export

Export checkpoint to tensorflow pb model for inference.

python export.py --pb_path=$(Your tensorflow pb model save path) --checkpoint_path=$(Your trained model path)

Test

Load tensorflow pb model for text recognition.

python test.py --pb_path=$(Your tensorflow pb model save path) --img_folder=$(Your test img folder)

Default use ICDAR2019-ArT for test, you can change it with your own test data.

Visualization

Scene text detection and recognition result:

Scene text recognition attention maps:

To learn more about attention mechanism, please refer to Attention Mechanism in Deep Learning.

Docker

I upload ICDAR2019 scene text recognition model include text detection and recognition to Docker Hub.

After nvidia-docker installed, run:

docker pull zhang0jhon/demo:ocr
docker run -it -p 5000:5000 --gpus all zhang0jhon/demo:ocr bash
cd /ocr/ocr
python flaskapp.py

Then you can test with your data via browser:

$(localhost or remote server ip address):5000

Scene text recognition

Related tags

Overview

AttentionOCR for Arbitrary-Shaped Scene Text Recognition

Introduction

Dependencies

Usage

Multiple Datasets

Train

Evaluation

Export

Test

Visualization

Docker

Owner

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Generate a list of papers with publicly available source code in the daily arxiv

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Neural search engine for AI papers

YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)（Rotation Detection）（Rotated BBox）

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is a GUI program which consist of 4 OpenCV projects

Automatically fishes for you while you are afk :)

Semantic-based Patch Detection for Binary Programs

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Convert scans of handwritten notes to beautiful, compact PDFs

Text layer for bio-image annotation.

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition