This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Overview

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of predicinting oriented bounding boxes rather than regular horizontal bounding boxes.

Many tasks need to predict an oriented bounding box, e.g: Scene Text Detection. Check out the detection results: (Note that this code doesn't train model to recognize text. Only the bounding boxes are predicted)

Goals

For each predicted bounding boxes, in addition to the regular horizontal bounding box, we need to predict one oriented bounding box. Basically it means that we need to regress to an oriented bounding box. In this project, we simply regress to the encoded 4 corners of the oriented bounding boxes(8 values). See below equation for the encoding function. j is the index for each corner. g represents ground truth oriented bounding boxes. w_a and h_a is the anchor width and height, respectively.

The reason of adopting this Faster RCNN/SSD framework:

There are many object detection framework to be used. We adopt this one as the basis for the following reasons:

Highly modular designed code

It's easy to change the encoding scheme in the code. Simply changing the code in box_coders folder. The encoding using [R2CNN] (https://arxiv.org/abs/1706.09579) will be released soon. Training model with faster rcnn or ssd is easy to modify.

Natural integration with slim nets

It's easy to change feature extraction CNN backbone by using slim nets.

Easy and clear configuration setting with google protobuf

Changing the network configuration setting is easy. For example, to change the different aspect ratios of the anchors used, simply changing the grid_anchor_generator in the configuration file.

Many supporting codes have been provided.

It provides many supporting code such as exporting the trained model to a frozen graph that can be used in production(For example, in your c++ project). Check out my another project DeepSceneTextReader which used the frozen graph trained with this code.

Code Changed compared to the original object detection implementation

Import path for each python file

You do not need to use blaze build to build the code. Simply run the code from the root directory for fast experiment.

proto files

added oriented related filed to the proto files. Please build them with

protoc protos/*.proto --python_out=.

Box encoding scheme

added code for encode and decode oriented bounding boxes

Added code in meta architecture for supporting oriented bounding box prediction

Add code to predict the oriented bounding boxes for each proposal. At the same time the add code to calculate the oriented bounding boxes regression loss.

Other changes regarding data reading, data decoding and others

Usage:

Create the tfrecord data

Use the code create_text_dataset.py to create the tfexample data files used for training. You can create ICDAR 2015 and ICDAR 2013 data for training.

Download the pretrained weight

If you are training faster rcnn inception resnet v2 model, you can download the pretrained weight from tensorflow model zoo.

change the specific configuration setting.

See data/faster_rcnn_inception_resnet_v2_atrous_text.config for example configuration The parameter: second_stage_localization_loss_weight_oriented is the weight for the oriented bounding box prediction.

Train the model

Example running script is provided: train_faster_rcnn_inception_resnet_v2.sh

Evaluation

Trained with default configuration with ResNet Inception V2 or ResNet 101 backbone on ICDAR 2013 + ICDAR 2015 training set. The performance on ICDAR 2015 dataset.

Backbone Recall Precision F-1
ResNet Inception V2 0.7371 0.8057 0.7699
ResNet 101 0.6861 0.8213 0.7476

To improve the performance, try changing the configuration settings. Many scene text detectors have more aspect ratios anchors for each location than that was used for regular object detection.

TODO

  1. Provide support for R2CNN training.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
A simple component to display annotated text in Streamlit apps.

Annotated Text Component for Streamlit A simple component to display annotated text in Streamlit apps. For example: Installation First install Streaml

Thiago Teixeira 312 Dec 30, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
TedEval: A Fair Evaluation Metric for Scene Text Detectors

TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,

Clova AI Research 167 Nov 20, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
Text to QR-CODE

QR CODE GENERATO USING PYTHON Author : RAFIK BOUDALIA. Installation Use the package manager pip to install foobar. pip install pyqrcode Usage from tki

Rafik Boudalia 2 Oct 13, 2021
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぶと、あなたがスペースキーを押すたびに、画面が変わる度にOCRが起動します。 usage1: simple OC

1 Feb 13, 2022
Color Picker and Color Detection tool for METR4202

METR4202 Color Detection Help This is sample code that can be used for the METR4202 project demo. There are two files provided, both running on Python

Miguel Valencia 1 Oct 23, 2021
Recognizing the text contents from a scanned visiting card

Recognizing the text contents from a scanned visiting card. The application which is used to recognize the text from scanned images,printeddocuments,r

Faizan Habib 1 Jan 28, 2022
list all open dataset about ocr.

ocr-open-dataset list all open dataset about ocr. printed dataset year Born-Digital Images (Web and Email) 2011-2015 COCO-Text 2017 Text Extraction fr

hongbomin 95 Nov 24, 2022
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.

Roboflow 52 Dec 23, 2022
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 04, 2022
This Repository contain Opencv Projects in python

Python-Opencv OpenCV OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was

Yash Sakre 2 Nov 06, 2021
Camera Intrinsic Calibration and Hand-Eye Calibration in Pybullet

This repository is mainly for camera intrinsic calibration and hand-eye calibration. Synthetic experiments are conducted in PyBullet simulator. 1. Tes

CAI Junhao 7 Oct 03, 2022
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
Learning Camera Localization via Dense Scene Matching, CVPR2021

This repository contains code of our CVPR 2021 paper - "Learning Camera Localization via Dense Scene Matching" by Shitao Tang, Chengzhou Tang, Rui Hua

tangshitao 65 Dec 01, 2022