Scene text detection and recognition based on Extremal Region(ER)

Overview

Scene text recognition

A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background.
This algorithm is based on several papers, and was implemented in C/C++.

Enviroment and dependency

  1. OpenCV 3.1 or above
  2. CMake 3.10 or above
  3. Visual Studio 2017 Community or above (Windows-only)

How to build?

Windows

  1. Install OpenCV; put the opencv directory into C:\tools
    • You can install it manually from its Github repo, or
    • You can install it via Chocolatey: choco install opencv, or
    • If you already have OpenCV, edit CMakeLists.txt and change WIN_OPENCV_CONFIG_PATH to where you have it
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-win
    cd build-win
    cmake .. -G "Visual Studio 15 2017 Win64"
  3. Use CMake to build the project
    cmake --build . --config Release
  4. Find the binaries in the root directory
    cd ..
    dir | findstr scene
  5. To execute the scene_text_recognition.exe binary, use its wrapper script; for example:
    .\scene_text_recognition.bat -i res\ICDAR2015_test\img_6.jpg

Linux

  1. Install OpenCV; refer to OpenCV Installation in Linux
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-linux
    cd build-linux
    cmake ..
  3. Use CMake to build the project
    cmake --build .
  4. Find the binaries in the root directory
    cd ..
    ls | grep scene
  5. To execute the binaries, run them as-is; for example:
    ./scene_text_recognition -i res/ICDAR2015_test/img_6.jpg

Usage

The executable file scene_text_recognition must ultimately exist in the project root directory (i.e., next to classifier/, dictionary/ etc.)

./scene_text_recognition -v:            take default webcam as input  
./scene_text_recognition -v [video]:    take a video as input  
./scene_text_recognition -i [image]:    take an image as input  
./scene_text_recognition -i [path]:     take folder with images as input,  
./scene_text_recognition -l [image]:    demonstrate "Linear Time MSER" Algorithm  
./scene_text_recognition -t detection:  train text detection classifier  
./scene_text_recognition -t ocr:        train text recognition(OCR) classifier 

Train your own classifier

Text detection

  1. Put your text data to res/pos, non-text data to res/neg
  2. Name your data in numerical, e.g. 1.jpg, 2.jpg, 3.jpg, and so on.
  3. Make sure training folder exist
  4. Run ./scene_text_recognition -t detection
mkdir training
./scene_text_recognition -t detection
  1. Text detection classifier will be found at training folder

Text recognition(OCR)

  1. Put your training data to res/ocr_training_data/
  2. Arrange the data in [Font Name]/[Font Type]/[Category]/[Character.jpg], for instance Time_New_Roman/Bold/lower/a.jpg. You can refer to res/ocr_training_data.zip
  3. Make sure training folder exist, and put svm-train to root folder (svm-train will be build by the system and should be found at build/)
  4. Run ./scene_text_recognition -t ocr
mkdir training
mv svm-train scene-text-recognition/
scene_text_recognition -t ocr
  1. Text recognition(OCR) classifier will be fould at training folder

How it works

The algorithm is based on an region detector called Extremal Region (ER), which is basically the superset of famous region detector MSER. We use ER to find text candidates. The ER is extracted by Linear-time MSER algorithm. The pitfall of ER is repeating detection, therefore we remove most of repeating ERs with non-maximum suppression. We estimate the overlapped between ER based on the Component tree. and calculate the stability of every ER. Among the same group of overlapped ER, only the one with maximum stability is kept. After that we apply a 2-stages Real-AdaBoost to fliter non-text region. We choose Mean-LBP as feature because it's faster compare to other features. The suviving ERs are then group together to make the result from character-level to word level, which is more instinct for human. Our next step is to apply an OCR to these detected text. The chain-code of the ER is used as feature and the classifier is trained by SVM. We also introduce several post-process such as optimal-path selection and spelling check to make the recognition result better.

overview

Notes

For text classification, the training data contains 12,000 positive samples, mostly extract from ICDAR 2003 and ICDAR 2015 dataset. the negative sample are extracted from random images with a bootstrap process. As for OCR classification, the training data is consist of purely synthetic letters, including 28 different fonts.

The system is able to detect text in real-time(30FPS) and recognize text in nearly real-time(8~15 FPS, depends on number of texts) for a 640x480 resolution image on a Intel Core i7 desktop computer. The algorithm's end-to-end text detection accuracy on ICDAR dataset 2015 is roughly 70% with fine tune, and end-to-end recognition accuracy is about 30%.

Result

Detection result on IDCAR 2015

result1 result2 result3

Recognition result on random image

result4 result5

Linear Time MSER Demo

The green pixels are so called boundry pixels, which are pushed into stacks. Each stack stand for a gray level, and pixels will be pushed according to their gary level. result4

References

  1. D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” European Conference on Computer Vision, pages 183196, 2008.
  2. L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Asian Conference on Computer Vision, pages 770783, 2010.
  3. L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Computer Vision and Pattern Recognition, pages 35383545, 2012.
  4. L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” International Conference on Document Analysis and Recognition, pages 523527, 2013.
  5. H. Cho, M. Sung and B. Jun, ”Canny Text Detector: Fast and robust scene text localization algorithm,” Computer Vision and Pattern Recognition, pages 35663573, 2016.
  6. B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition, pages 29632970, 2010.
  7. P. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, pages 511518, 2001.
Owner
HSIEH, YI CHIA
HSIEH, YI CHIA
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

OpenCV-CameraCalibration-Example FishEyeCameraCalibration.mp4 OpenCVを用いたカメラキャリブレーションのサンプルです 2021/06/21時点でPython実装のある以下3種類について用意しています。 通常カメラ向け 魚眼レンズ向け(

KazuhitoTakahashi 34 Nov 17, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
BoxToolBox is a simple python application built around the openCV library

BoxToolBox is a simple python application built around the openCV library. It is not a full featured application to guide you through the w

František Horínek 1 Nov 12, 2021
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
A pkg stiching around view images(4-6cameras) to generate bird's eye view.

AVP-BEV-OPEN Please check our new work AVP_SLAM_SIM A pkg stiching around view images(4-6cameras) to generate bird's eye view! View Demo · Report Bug

Xinliang Zhong 37 Dec 01, 2022
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 06, 2023
基于Paddle框架的PSENet复现

PSENet-Paddle 基于Paddle框架的PSENet复现 本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 AIStudio链接 参考项目: whai362-PSENet 环境配置 本项目

QuanHao Guo 4 Apr 24, 2022