This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Overview

DeepSceneTextReader

This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images.

Prerequsites

The project is written in c++ using tensorflow computational framework. It is tested using tensorflow 1.4. Newer version should be ok too, but not tested. Please install:

Please check this project on how to build project using tensorflow with cmake: https://github.com/cjweeks/tensorflow-cmake It greatly helped the progress of building this project. When building tensorflow library, please be careful since we need to use opencv. Looks like there is still problem when including tensorflow and opencv together. It will make opencv unable to read image. Check out this issue: https://github.com/tensorflow/tensorflow/issues/14267 The answer by allenlavoie solved my problem, so I paste it here:

"In the meantime, as long as you're not using any custom ops you can build libtensorflow_cc.so with bazel build --config=monolithic, which will condense everything together into one shared object (no libtensorflow_framework dependence) and seal off non-TensorFlow symbols. That shared object will have protocol buffer symbols."

Status

Currently two pretrained model is provided. One for scene text detection, and one for scene text recognition. More model will be provided. Note that the current model is not so robust. U can easily change to ur trained model. The models will be continuously updated.

build process

cd build

cmake ..

make

It will create an excutable named DetectText in bin folder.

Usage:

The excutable could be excuted in three modes: (1) Detect (2) Recognize (3) Detect and Recognize

Detect

Download the pretrained detector model and put it in model/

./DetectText --detector_graph='model/Detector_model.pb'
--image_filename='test_images/test_img1.jpg' --mode='detect' --output_filename='results/output_image.jpg'

Recognize

Download the pretrained recognizer model and put it in model/ Download the dictionary file and put it in model

./DetectText --recognizer_graph='model/Recognizer_model.pb'
--image_filename='test_images/recognize_image1.jpg' --mode='recognize'
--im_height=32 --im_width=128

Detect and Recognize

Download the pretrained detector and recognizer model and put it in model/ as described previously.

./DetectText --recognizer_graph=$recognizer_graph --detector_graph='model/Detector_model.pb'
--image_filename='model/Recognizer_model.pb' --mode='detect_and_read' --output_filename='results/output_image.jpg'

Model Description

Detector

  1. Faster RCNN Detector Model The detector is trained with modified tensorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) I modify it by changing the proposal scheme to regress to the 4 coordinates of the oriented bounding box rather than regular rectangular bounding box. Check out this repo for the training code. Pretrained model: FasterRCNN_detector_model.pb

  2. R2CNN will be updated. See R2CNN for details. The code is also modified with tnesorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) The training code will be released soon.

Recognizer

  1. CTC scene text recognizer. The recognizer model follows the famous scene text recognition CRNN model

  2. Spatial Attention OCR will be updated soon. It is based on GoogleOCR

Detect and Recognize

The whole scene text reading pipeline detects the text and rotate it horizontally and read it with recognizer. The pipeline is here:

Pretrained Models

You can play with the code with provided pretrained models.
They are not fully optimized yet, but could be used for being familiar with the code.
Check them out here: models

You will find two detection models called: (1) FasterRCNN_detector_model.pb (2) R2CNN_detector_model.pb
Two recognition models with their charset: (1) Recognizer_model.pb + charset_full.txt and (2)Recognizer_model_case_insen.pb + charset_case_insen.txt.
Full charset means English letters + digit and case insen means case insensitive English letters + digit. Let me know if u have any problens using them.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Zhao,Xiaohui 147 Dec 20, 2022
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
OCR, Object Detection, Number Plate, Real Time

README.md PrePareded anaconda env requirements.txt clova AI → deep text recognition → trained weights (ex, .pth) wpod-net weights (ex, .h5 , .json) ht

Kaven Lee 7 Dec 06, 2022
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
Random maze generator and solver

Maze Generator and Solver I wrote a maze generator that works with two commonly known algorithms: Depth First Search and Randomized Prims. Both of the

Daniel Pérez 10 Sep 23, 2022
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 04, 2022
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
A selectional auto-encoder approach for document image binarization

The code of this repository was used for the following publication. If you find this code useful please cite our paper: @article{Gallego2019, title =

Javier Gallego 89 Nov 18, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 165 Dec 31, 2022
Zoom , GoogleMeets에서 Vtuber 데뷔하기

EasyVtuber Facial landmark와 GAN을 이용한 Character Face Generation Google Meets, Zoom 등에서 자신만의 웹툰, 만화 캐릭터로 대화해보세요! 악세사리는 어느정도 추가해도 잘 작동해요! 안타깝게도 RTX 2070

Gunwoo Han 140 Dec 23, 2022
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
Text language identification using Wikipedia data

Text language identification using Wikipedia data The aim of this project is to provide high-quality language detection over all the web's languages.

Vsevolod Dyomkin 28 Jul 09, 2022