Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Last update: Dec 22, 2022

Related tags

Overview

Detecting Text in Natural Image with Connectionist Text Proposal Network

The codes are used for implementing CTPN for scene text detection, described in:

Z. Tian, W. Huang, T. He, P. He and Y. Qiao: Detecting Text in Natural Image with
Connectionist Text Proposal Network, ECCV, 2016.

Online demo is available at: textdet.com

These demo codes (with our trained model) are for text-line detection (without side-refinement part).

Required hardware

You need a GPU. If you use CUDNN, about 1.5GB free memory is required. If you don't use CUDNN, you will need about 5GB free memory, and the testing time will slightly increase. Therefore, we strongly recommend to use CUDNN.

It's also possible to run the program on CPU only, but it's extremely slow due to the non-optimal CPU implementation.

Required softwares

Python2.7, cython and all what Caffe depends on.

How to run this code

Clone this repository with git clone https://github.com/tianzhi0549/CTPN.git. It will checkout the codes of CTPN and Caffe we ship.
Install the caffe we ship with codes bellow.
- Install caffe's dependencies. You can follow this tutorial. Note: we need Python support. The CUDA version we need is 7.0.
- Enter the directory caffe.
- Run cp Makefile.config.example Makefile.config.
- Open Makefile.config and set WITH_PYTHON_LAYER := 1. If you want to use CUDNN, please also set CUDNN := 1. Uncomment the CPU_ONLY :=1 if you want to compile it without GPU.
  
  Note: To use CUDNN, you need to download CUDNN from NVIDIA's official website, and install it in advance. The CUDNN version we use is 3.0.
- Run make -j && make pycaffe.
After Caffe is set up, you need to download a trained model (about 78M) from Google Drive or our website, and then populate it into directory models. The model's name should be ctpn_trained_model.caffemodel.
Now, be sure you are in the root directory of the codes. Run make to compile some cython files.
Run python tools/demo.py for a demo. Or python tools/demo.py --no-gpu to run it under CPU mode.

How to use other Caffe

If you may want to use other Caffe instead of the one we ship for some reasons, you need to migrate the following layers into the Caffe.

Reverse
Transpose
Lstm

License

The codes are released under the MIT License.

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Related tags

Overview

Detecting Text in Natural Image with Connectionist Text Proposal Network

Required hardware

Required softwares

How to run this code

How to use other Caffe

License

Owner

Tian Zhi

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Create single line SVG illustrations from your pictures

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF

原神风花节自动弹琴辅助

Table recognition inside douments using neural networks

Textboxes_plusplus implementation with Tensorflow (python)

3点クリックで円を指定し、極座標変換を行うサンプルプログラム

OCR software for recognition of handwritten text

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Read Japanese manga inside browser with selectable text.

chineseocr/table_line 表格线检测模型pytorch版

https://arxiv.org/abs/1904.01941

Resizing Canny Countour In Python

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

Framework for the Complete Gaze Tracking Pipeline