caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Last update: Dec 28, 2021

Overview

R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Abstract
Structor
Installation
Demo
Test
Train
Experiments
Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make

Download Model

please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.

Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/

    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches	Anchor Scales	Pooled sizes	Inclined NMS	Test scales(short side)	F-measure(Mine VS paper)
R²CNN-2	(4, 8, 16)	(7, 7)	Y	(720)	71.12% VS 68.49%
R²CNN-3	(4, 8, 16)	(7, 7)	Y	(720)	73.10% VS 74.29%
R²CNN-4	(4, 8, 16, 32)	(7, 7)	Y	(720)	74.14% VS 74.36%
R²CNN-4	(4, 8, 16, 32)	(7, 7)	Y	(720, 1200)	79.05% VS 81.80%
R²CNN-5	(4, 8, 16, 32)	(7, 7) (11, 3) (3, 11)	Y	(720)	74.34% VS 75.34%
R²CNN-5	(4, 8, 16, 32)	(7, 7) (11, 3) (3, 11)	Y	(720, 1200)	78.70% VS 82.54%

Appendixes

Approaches	Anchor Scales	aspect ration	Pooled sizes	Inclined NMS	Test scales(short side)	F-measure
R²CNN-4	(4, 8, 16, 32)	(0.5, 1, 2)	(7, 7)	Y	(720)	74.36%
R²CNN-4	(4, 8, 16, 32)	(0.5, 1, 2)	(7, 7)	Y	(720, 1200)	VS 81.80%
R²CNN-4-TextBoxes-OHEM	(4, 8, 16, 32)	(0.5, 1, 2, 3, 5, 7, 10)	(7, 7)	Y	(720)	76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Related tags

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

Contents

Structor

Code structor

Data structor

Installation

Install caffe

Build

Download Model

Demo

Test

Single Test

Multi-scale Test

Train

Train data

Train commands

Experiments

Mine VS Paper

Appendixes

Furthermore

Owner

candler

Optical character recognition for Japanese text, with the main focus being Japanese manga

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

Make OpenCV camera loops less of a chore by skipping the boilerplate and getting right to the interesting stuff

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Automatic Number Plate Recognition (ANPR) is a highly accurate system capable of reading vehicle number plates without human intervention

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.

1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

Resizing Canny Countour In Python

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

A simple QR-Code Reader in Python

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

A tensorflow implementation of EAST text detector

Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

Pixie - A full-featured 2D graphics library for Python

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

Color Picker and Color Detection tool for METR4202

Qrcode Attendence System with Opencv and Pyzbar

The open source extract transaction infomation by using OCR.

R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection