An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Overview

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link

Contents:

  1. Introduction
  2. Installation&requirements
  3. Datasets
  4. Problems
  5. Models
  6. Test Your own images
  7. Models
  8. Some Comments

Introduction

This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie

Installation&requirements

  1. tensorflow-gpu 1.1.0

  2. cv2. I'm using 2.4.9.1, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.

  3. download the project pylib and add the src folder to your PYTHONPATH

If any other requirements unmet, just install them following the error msg.

Datasets

  1. SynthText

  2. ICDAR2015

Convert them into tfrecords format using the scripts in datasets if you wanna train your own model.

Problems

The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.

Several reasons may contribute to the slow convergency of my model:

  1. Batch size. I don't have 4 12G-Titans for training, as described in the paper. Instead, I trained my model on two 8G GeForce GTX 1080 or two Titans.
  2. Learning Rate. In the paper, 10^-3 and 10^-4 have been used. But I adopted a fixed learning rate of 10^-4.
  3. Different initialization model. I used the pretrained VGG model from SSD-caffe on coco , because I thought it better than VGG trained on ImageNet. However, it seems that my point of view does not hold. 4.Some other differences exists maybe, I am not sure.

Models

Two models trained on SynthText and IC15 train can be downloaded.

  1. seglink-384. Trained using image size of 384x384, the same image size as the paper. The Hmean is comparable to the result reported in the paper.

The hust_orientedText is the result of paper.

  1. seglink-512. Trainied using image size of 512x512, and one pointer better than 384x384.

They have been trained:

  • on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.

  • learning_rate = 10e-4

  • two gpus

  • 384: GTX 1080, batch_size = 24; 512: Titan, batch_size = 20

Both models perform best at seg_conf_threshold=0.8 and link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.

Test Your own images

Use the script test_seglink.py, and a shortcut has been created in script test.sh:

Go to the seglink root directory and execute the command:


./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

For example:


./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867  ~/dataset/ICDAR2015/Challenge4/ch4_training_images

I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as DATASET_DIR.

A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.

The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script visualize_detection_result.py.

The command looks like:


python visualize_detection_result.py \

    --image=where your images are put

    --det=the directory of the text files output by test_seglink.py

    --output=the output directory of detection results drawn on images.

For example:


python visualize_detection_result.py \

    --image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \

    --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \
    --output=~/temp/no-use/seglink_result_512_train

Training and evaluation

The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the datasets directory. The scrips:train_seglink.py and eval_seglink.py are the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.

Some Comments

Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.

EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.

Contact me if you have any problems, through github issues.

Some Notes On Implementation Detail

How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg

Owner
dengdan
Master on CS, from Zhejiang University; Now, perception algorithm R&D in FABU.ai, on automous driving
dengdan
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
TableBank: A Benchmark Dataset for Table Detection and Recognition

TableBank TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on th

844 Jan 04, 2023
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Fatigue Driving Detection Based on Dlib

Fatigue Driving Detection Based on Dlib

5 Dec 14, 2022
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
Polaris is a Face recognition attendance system .

Support Me 🚀 About Polaris 📄 Polaris is a system based on facial recognition with a futuristic GUI design, Can easily find people informations store

XN3UR0N 215 Dec 26, 2022
Automatically remove the mosaics in images and videos, or add mosaics to them.

Automatically remove the mosaics in images and videos, or add mosaics to them.

Hypo 1.4k Dec 30, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.

Charset Detection, for Everyone 👋 The Real First Universal Charset Detector A library that helps you read text from an unknown charset encoding. Moti

TAHRI Ahmed R. 332 Dec 31, 2022
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
Driver Drowsiness Detection with OpenCV & Dlib

In this project, we have built a driver drowsiness detection system that will detect if the eyes of the driver are close for too long and infer if the driver is sleepy or inactive.

Mansi Mishra 4 Oct 26, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
Color Picker and Color Detection tool for METR4202

METR4202 Color Detection Help This is sample code that can be used for the METR4202 project demo. There are two files provided, both running on Python

Miguel Valencia 1 Oct 23, 2021