AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

Overview

AdvancedEAST

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Detector, and the significant improvement was also made, which make long text predictions more accurate. If this project is helpful to you, welcome to star. And if you have any problem, please contact me.

advantages

  • writen in keras, easy to read and run
  • base on EAST, an advanced text detect algorithm
  • easy to train the model
  • significant improvement was made, long text predictions more accurate.(please see 'demo results' part bellow, and pay attention to the activation image, which starts with yellow grids, and ends with green grids.)

In my experiments, AdvancedEast has obtained much better prediction accuracy then East, especially on long text. Since East calculates final vertexes coordinates with weighted mean values of predicted vertexes coordinates of all pixels. It is too difficult to predict the 2 vertexes from the other side of the quadrangle. See East limitations picked from original paper bellow. East limitations

project files

  • config file:cfg.py,control parameters
  • pre-process data: preprocess.py,resize image
  • label data: label.py,produce label info
  • define network network.py
  • define loss function losses.py
  • execute training advanced_east.py and data_generator.py
  • predict predict.py and nms.py

后置处理过程说明参见 后置处理(含原理图)

network arch

  • AdvancedEast

AdvancedEast network arch

网络输出说明: 输出层分别是1位score map, 是否在文本框内;2位vertex code,是否属于文本框边界像素以及是头还是尾;4位geo,是边界像素可以预测的2个顶点坐标。所有像素构成了文本框形状,然后只用边界像素去预测回归顶点坐标。边界像素定义为黄色和绿色框内部所有像素,是用所有的边界像素预测值的加权平均来预测头或尾的短边两端的两个顶点。头和尾部分边界像素分别预测2个顶点,最后得到4个顶点坐标。

原理简介(含原理图)

  • East

East network arch

setup

  • python 3.6.3+
  • tensorflow-gpu 1.5.0+(or tensorflow 1.5.0+)
  • keras 2.1.4+
  • numpy 1.14.1+
  • tqdm 4.19.7+

training

  • tianchi ICPR dataset download 链接: https://pan.baidu.com/s/1NSyc-cHKV3IwDo6qojIrKA 密码: ye9y

  • prepare training data:make data root dir(icpr), copy images to root dir, and copy txts to root dir, data format details could refer to 'ICPR MTWI 2018 挑战赛二:网络图像的文本检测', Link

  • modify config params in cfg.py, see default values.

  • python preprocess.py, resize image to 256256,384384,512512,640640,736*736, and train respectively could speed up training process.

  • python label.py

  • python advanced_east.py, train entrance

  • python predict.py -p demo/001.png, to predict

  • pretrain model download(use for test) 链接: https://pan.baidu.com/s/1KO7tR_MW767ggmbTjIJpuQ 密码: kpm2

demo results

001原图 001激活图 001预测图

004原图 004激活图 004预测图

005原图 005激活图 005预测图

  • compared with east based on vgg16

As you can see, although the text area prediction is very accurate, the vertex coordinates are not accurate enough.

001激活图 001预测图

License

The codes are released under the MIT License.

references

网络输出说明: 输出层分别是1位score map, 是否在文本框内;2位vertex code,是否属于文本框边界像素以及是头还是尾;4位geo,是边界像素可以预测的2个顶点坐标。所有像素构成了文本框形状,然后只用边界像素去预测回归顶点坐标。边界像素定义为黄色和绿色框内部所有像素,是用所有的边界像素预测值的加权平均来预测头或尾的短边两端的两个顶点。头和尾部分边界像素分别预测2个顶点,最后得到4个顶点坐标。

原理简介(含原理图)

后置处理过程说明参见 后置处理(含原理图)

A Simple RaspberryPi Car Project

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 05, 2023
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

An Agnostic Object Detection Framework IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-q

airctic 790 Jan 05, 2023
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Pinch-zoom This is a python project based on real-time hand-gesture detection, to zoom in or out, using the distance between the index finger and the

Harshit Bhalla 6 Jul 11, 2022
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)

SEAM The implementation of Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentaion. You can also download the repos

Hibercraft 459 Dec 26, 2022
Creating a virtual tv using opencv in python3.

Virtual-TV Creating a virtual tv using opencv in python3. In order to run the code follow the below given steps: Make sure the desired videos which ar

Vamsi 1 Jan 01, 2022
Text Detection from images using OpenCV

EAST Detector for Text Detection OpenCV’s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel

Abhishek Singh 88 Oct 20, 2022
Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

動機 face_recognitionを使用して得られる顔座標は長方形であり、この座標をそのまま用いてぼかし処理を行った場合得られる画像は醜い。 それに対してモ

Yoshitsugu Kesamaru 6 Feb 03, 2022
A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Security Camera using Opencv & Dropbox This is a simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dro

Arpit Rath 1 Jan 31, 2022