PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Last update: Dec 24, 2022

Related tags

Overview

News

Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch.

[1] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9336–9345, 2019.
[2] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comp. Vis., pages 8440–8449, 2019.
[3] Paper is in preparation.

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Python 2.7
PyTorch v0.4.1+
pyclipper
Polygon2
OpenCV 3.4 (for c++ version pse)
opencv-python 3.4

Introduction

Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_ic15.py

Testing

CUDA_VISIBLE_DEVICES=0 python test_ic15.py --scale 1 --resume [path of model]

Eval script for ICDAR 2015 and SCUT-CTW1500

cd eval
sh eval_ic15.sh
sh eval_ctw1500.sh

Performance (new version paper)

ICDAR 2015

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	81.49	79.68	80.57	1.6	baiduyun(extract code: rxti); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	86.92	84.5	85.69	1.6	baiduyun(extract code: aieo); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	86.1	83.77	84.92	3.8	baiduyun(extract code: aieo); OneDrive

SCUT-CTW1500

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	80.57	75.55	78.0	3.9	baiduyun(extract code: ksv7); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	84.84	79.73	82.2	3.9	baiduyun(extract code: z7ac); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	82.09	77.84	79.9	8.4	baiduyun(extract code: z7ac); OneDrive

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	87.98	83.87	85.88
PSENet-2s (ResNet152)	89.30	85.22	87.21
PSENet-1s (ResNet152)	88.71	85.51	87.08

ICDAR 2017 MLT

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	75.98	67.56	71.52
PSENet-2s (ResNet152)	76.97	68.35	72.40
PSENet-1s (ResNet152)	77.01	68.40	72.45

SCUT-CTW1500

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	80.49	78.13	79.29
PSENet-2s (ResNet152)	81.95	79.30	80.60
PSENet-1s (ResNet152)	82.50	79.89	81.17

ICPR MTWI 2018 Challenge 2

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-1s (ResNet152)	78.5	72.1	75.2

Results

Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT-CTW1500

Paper Link

[new version paper] https://arxiv.org/abs/1903.12473

[old version paper] https://arxiv.org/abs/1806.02559

Other Implements

[tensorflow version (thanks @liuheng92)] https://github.com/liuheng92/tensorflow_PSENet

Citation

@inproceedings{wang2019shape,
  title={Shape Robust Text Detection With Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}

PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Related tags

Overview

News

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Introduction

Training

Testing

Eval script for ICDAR 2015 and SCUT-CTW1500

Performance (new version paper)

ICDAR 2015

SCUT-CTW1500

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

ICDAR 2017 MLT

SCUT-CTW1500

ICPR MTWI 2018 Challenge 2

Results

Paper Link

Other Implements

Citation

Owner

A dataset handling library for computer vision datasets in LOST-fromat

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Computer vision applications project (Flask and OpenCV)

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

A toolbox of scene text detection and recognition

OpenGait is a flexible and extensible gait recognition project

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Scene text recognition

Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

Text modding tools for FF7R (Final Fantasy VII Remake)

document image degradation

A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

Library used to deskew a scanned document

Msos searcher - A half-hearted attempt at finding a magic square of squares

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

天池2021"全球人工智能技术创新大赛"【赛道一】：医学影像报告异常检测 - 第三名解决方案

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Image processing in Python

A set of workflows for corpus building through OCR, post-correction and normalisation