Corner-based Region Proposal Network

Related tags

Computer Visioncrpn
Overview

Corner-based Region Proposal Network

CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possible locations of text instances and a region-wise subnetwork for further classification and regression. In our experiments, it achieves F-measure of 0.876 and 0.845 on ICDAR 2013 and 2015 respectively. The paper is available at arXiv.

Installation

This code is based on Caffe and py-faster-rcnn. It has been tested on Ubuntu 16.04 with CUDA 8.0.

  1. Clone this repository

    git clone https://github.com/xhzdeng/crpn.git
    
  2. Build Caffe and pycaffe

    cd $CRPN_ROOT/caffe-fast-rcnn
    make -j8 && make pycaffe
    
  3. Build the Cython modules

    cd $CRPN_ROOT/lib
    make
    
  4. Prepare your own training data directory. For convenience, it should have this basic structure.

    $VOCdevkit/
    $VOCdevkit/VOC2007                    # image sets, annotations, etc. 
    

    And create symlinks for YOUR dataset

    cd $CRPN_ROOT/data
    ln -s [path] VOCdevkit
    
  5. Download pretrained ImageNet VGG-16 model. You can find it at Caffe Model Zoo.

  6. Train with YOUR dataset

    cd $CRPN_ROOT
    ./experiments/scripts/train.sh [NET] [MODEL] [DATASET] [ITER_NUM]
    # NET is the network arch to use, only {vgg16} in this implemention
    # MODEL is the pre-trained model you want to use to initial your weights
    # DATASET points to your dataset, please refer the contents of train.sh
    # IETR_NUM 
    
  7. Test with YOUR models

    cd $CRPN_ROOT
    ./experiments/scripts/test.sh [NET] [MODEL] [DATASET]
    # NET is the network arch to use, only {vgg16} in this implemention
    # MODEL is the testing model
    # DATASET points to your dataset, please refer the contents of test.sh
    

    Test outputs are saved under:

    output/<experiment directory>/<dataset name>/<network snapshot name>/
    

Demo

```
cd $CRPN_ROOT
./tools/demo.py --net [NET] --model [MODEL]
# NET is the network arch to use, only {vgg16} in this implemention
# MODEL is the path of caffemodel you want to use
```

Models

Now, you can download the pretrained model from OneDrive or BaiduYun, which is trained 100k iters on SynthText. I also have uploaded a testing model trained recently. It achieves an F-measure of 0.8456 at 840p resolution on ICDAR 2015, similar performance but slightly faster than we depicted in the paper.

Citation

If you find the paper and code useful in your research, please consider citing:

@article{deng2018crpn,
    Title = {Detecting Multi-Oriented Text with Corner-based Region Proposals},
    Author = {Linjie Deng and Yanxiang Gong and Yi Lin and Jingwen Shuai and Xiaoguang Tu and Yufei Zhang and Zheng Ma and Mei Xie},
    Journal = {arXiv preprint arXiv:1804.02690},
    Year = {2018}
}
Owner
xhzdeng
xhzdeng
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Text layer for bio-image annotation.

napari-text-layer Napari text layer for bio-image annotation. Installation You can install using pip: pip install napari-text-layer Keybindings and m

6 Sep 29, 2022
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023
This Repository contain Opencv Projects in python

Python-Opencv OpenCV OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was

Yash Sakre 2 Nov 06, 2021
Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Mauricio Villegas 27 Jun 20, 2022
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
This repository contains codes on how to handle mouse event using OpenCV

Handling-Mouse-Click-Events-Using-OpenCV This repository contains codes on how t

Happy N. Monday 3 Feb 15, 2022
A simple QR-Code Reader in Python

A simple QR-Code Reader written in Python, that copies the content of a QR-Code directly into the copy clipboard.

Eric 1 Oct 28, 2021
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
Brief idea about our project is mentioned in project presentation file.

Brief idea about our project is mentioned in project presentation file. You just have to run attendance.py file in your suitable IDE but we prefer jupyter lab.

Dhruv ;-) 3 Mar 20, 2022
Generic framework for historical document processing

dhSegment dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different ty

Digital Humanities Laboratory 343 Dec 24, 2022
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are comp

Pavankumar Khot 4 Mar 19, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
TableBank: A Benchmark Dataset for Table Detection and Recognition

TableBank TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on th

844 Jan 04, 2023
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022