PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Last update: Dec 24, 2022

Related tags

Overview

News

Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch.

[1] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9336–9345, 2019.
[2] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comp. Vis., pages 8440–8449, 2019.
[3] Paper is in preparation.

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Python 2.7
PyTorch v0.4.1+
pyclipper
Polygon2
OpenCV 3.4 (for c++ version pse)
opencv-python 3.4

Introduction

Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_ic15.py

Testing

CUDA_VISIBLE_DEVICES=0 python test_ic15.py --scale 1 --resume [path of model]

Eval script for ICDAR 2015 and SCUT-CTW1500

cd eval
sh eval_ic15.sh
sh eval_ctw1500.sh

Performance (new version paper)

ICDAR 2015

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	81.49	79.68	80.57	1.6	baiduyun(extract code: rxti); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	86.92	84.5	85.69	1.6	baiduyun(extract code: aieo); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	86.1	83.77	84.92	3.8	baiduyun(extract code: aieo); OneDrive

SCUT-CTW1500

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	80.57	75.55	78.0	3.9	baiduyun(extract code: ksv7); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	84.84	79.73	82.2	3.9	baiduyun(extract code: z7ac); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	82.09	77.84	79.9	8.4	baiduyun(extract code: z7ac); OneDrive

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	87.98	83.87	85.88
PSENet-2s (ResNet152)	89.30	85.22	87.21
PSENet-1s (ResNet152)	88.71	85.51	87.08

ICDAR 2017 MLT

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	75.98	67.56	71.52
PSENet-2s (ResNet152)	76.97	68.35	72.40
PSENet-1s (ResNet152)	77.01	68.40	72.45

SCUT-CTW1500

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	80.49	78.13	79.29
PSENet-2s (ResNet152)	81.95	79.30	80.60
PSENet-1s (ResNet152)	82.50	79.89	81.17

ICPR MTWI 2018 Challenge 2

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-1s (ResNet152)	78.5	72.1	75.2

Results

Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT-CTW1500

Paper Link

[new version paper] https://arxiv.org/abs/1903.12473

[old version paper] https://arxiv.org/abs/1806.02559

Other Implements

[tensorflow version (thanks @liuheng92)] https://github.com/liuheng92/tensorflow_PSENet

Citation

@inproceedings{wang2019shape,
  title={Shape Robust Text Detection With Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}

PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Related tags

Overview

News

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Introduction

Training

Testing

Eval script for ICDAR 2015 and SCUT-CTW1500

Performance (new version paper)

ICDAR 2015

SCUT-CTW1500

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

ICDAR 2017 MLT

SCUT-CTW1500

ICPR MTWI 2018 Challenge 2

Results

Paper Link

Other Implements

Citation

Owner

Face Anonymizer - FaceAnonApp v1.0

The Open Source Framework for Machine Vision

Automatic Number Plate Recognition (ANPR) is a highly accurate system capable of reading vehicle number plates without human intervention

Deep learning based page layout analysis

OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

Generic framework for historical document processing

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

Color Picker and Color Detection tool for METR4202

Camelot: PDF Table Extraction for Humans

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Text Detection from images using OpenCV

TedEval: A Fair Evaluation Metric for Scene Text Detectors

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Vietnamese Language Detection and Recognition

MONAI Label is a server-client system that facilitates interactive medical image annotation by using AI.

Captcha Recognition

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Face Recognizer using Opencv Python

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers