Rotational region detection based on Faster-RCNN.

Last update: Nov 22, 2022

Overview

R2CNN_Faster_RCNN_Tensorflow

Abstract

This is a tensorflow re-implementation of R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
It should be noted that we did not re-implementate exactly as the paper and just adopted its idea.

This project is based on Faster-RCNN, and completed by YangXue and YangJirui.

DOTA test results

Comparison

Part of the results are from DOTA paper.

Task1 - Oriented Leaderboard

Approaches	mAP	PL	BD	BR	GTF	SV	LV	SH	TC	BC	ST	SBF	RA	HA	SP	HC
SSD	10.59	39.83	9.09	0.64	13.18	0.26	0.39	1.11	16.24	27.57	9.23	27.16	9.09	3.03	1.05	1.01
YOLOv2	21.39	39.57	20.29	36.58	23.42	8.85	2.09	4.82	44.34	38.35	34.65	16.02	37.62	47.23	25.5	7.45
R-FCN	26.79	37.8	38.21	3.64	37.26	6.74	2.6	5.59	22.85	46.93	66.04	33.37	47.15	10.6	25.19	17.96
FR-H	36.29	47.16	61	9.8	51.74	14.87	12.8	6.88	56.26	59.97	57.32	47.83	48.7	8.23	37.25	23.05
FR-O	52.93	79.09	69.12	17.17	63.49	34.2	37.16	36.2	89.19	69.6	58.96	49.4	52.52	46.69	44.8	46.3
R²CNN	60.67	80.94	65.75	35.34	67.44	59.92	50.91	55.81	90.67	66.92	72.39	55.06	52.23	55.14	53.35	48.22
RRPN	61.01	88.52	71.20	31.66	59.30	51.85	56.19	57.25	90.81	72.84	67.38	56.69	52.84	53.08	51.94	53.58
ICN	68.20	81.40	74.30	47.70	70.30	64.90	67.80	70.00	90.80	79.10	78.20	53.60	62.90	67.00	64.20	50.20
R²CNN++	71.16	89.66	81.22	45.50	75.10	68.27	60.17	66.83	90.90	80.69	86.15	64.05	63.48	65.34	68.01	62.05

Task2 - Horizontal Leaderboard

Approaches	mAP	PL	BD	BR	GTF	SV	LV	SH	TC	BC	ST	SBF	RA	HA	SP	HC
SSD	10.94	44.74	11.21	6.22	6.91	2	10.24	11.34	15.59	12.56	17.94	14.73	4.55	4.55	0.53	1.01
YOLOv2	39.2	76.9	33.87	22.73	34.88	38.73	32.02	52.37	61.65	48.54	33.91	29.27	36.83	36.44	38.26	11.61
R-FCN	47.24	79.33	44.26	36.58	53.53	39.38	34.15	47.29	45.66	47.74	65.84	37.92	44.23	47.23	50.64	34.9
FR-H	60.46	80.32	77.55	32.86	68.13	53.66	52.49	50.04	90.41	75.05	59.59	57	49.81	61.69	56.46	41.85
R²CNN	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
FPN	72.00	88.70	75.10	52.60	59.20	69.40	78.80	84.50	90.60	81.30	82.60	52.50	62.10	76.60	66.30	60.10
ICN	72.50	90.00	77.70	53.40	73.30	73.50	65.00	78.20	90.80	79.10	84.80	57.20	62.10	73.50	70.20	58.10
R²CNN++	75.35	90.18	81.88	55.30	73.29	72.09	77.65	78.06	90.91	82.44	86.39	64.53	63.45	75.77	78.21	60.11

Face Detection

Environment: NVIDIA GeForce GTX 1060

ICDAR2015

Requirements

1、tensorflow >= 1.2
2、cuda8.0
3、python2.7 (anaconda2 recommend)
4、opencv(cv2)

Download Model

1、please download resnet50_v1、resnet101_v1 pre-trained models on Imagenet, put it to data/pretrained_weights.
2、please download mobilenet_v2 pre-trained model on Imagenet, put it to data/pretrained_weights/mobilenet.
3、please download trained model by this project, put it to output/trained_weights.

Data Prepare

1、please download DOTA
2、crop data, reference:

cd $PATH_ROOT/data/io/DOTA
python train_crop.py 
python val_crop.py

3、data format

├── VOCdevkit
│   ├── VOCdevkit_train
│       ├── Annotation
│       ├── JPEGImages
│    ├── VOCdevkit_test
│       ├── Annotation
│       ├── JPEGImages

Compile

cd $PATH_ROOT/libs/box_utils/
python setup.py build_ext --inplace

cd $PATH_ROOT/libs/box_utils/cython_utils
python setup.py build_ext --inplace

Demo

Select a configuration file in the folder (libs/configs/) and copy its contents into cfgs.py, then download the corresponding weights.

DOTA

python demo_rh.py --src_folder='/PATH/TO/DOTA/IMAGES_ORIGINAL/' 
                  --image_ext='.png' 
                  --des_folder='/PATH/TO/SAVE/RESULTS/' 
                  --save_res=False
                  --gpu='0'

FDDB

python camera_demo.py --gpu='0'

Eval

python eval.py --img_dir='/PATH/TO/DOTA/IMAGES/' 
               --image_ext='.png' 
               --test_annotation_path='/PATH/TO/TEST/ANNOTATION/'
               --gpu='0'

Inference

python inference.py --data_dir='/PATH/TO/DOTA/IMAGES_CROP/'      
                    --gpu='0'

Train

1、If you want to train your own data, please note:

(1) Modify parameters (such as CLASS_NUM, DATASET_NAME, VERSION, etc.) in $PATH_ROOT/libs/configs/cfgs.py
(2) Add category information in $PATH_ROOT/libs/label_name_dict/lable_dict.py     
(3) Add data_name to line 75 of $PATH_ROOT/data/io/read_tfrecord.py

2、make tfrecord

cd $PATH_ROOT/data/io/  
python convert_data_to_tfrecord.py --VOC_dir='/PATH/TO/VOCdevkit/VOCdevkit_train/' 
                                   --xml_dir='Annotation'
                                   --image_dir='JPEGImages'
                                   --save_name='train' 
                                   --img_format='.png' 
                                   --dataset='DOTA'

3、train

cd $PATH_ROOT/tools
python train.py

Tensorboard

cd $PATH_ROOT/output/summary
tensorboard --logdir=.

Citation

Some relevant achievements based on this code.

@article{[yang2018position](https://ieeexplore.ieee.org/document/8464244),
	title={Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network},
	author={Yang, Xue and Sun, Hao and Sun, Xian and  Yan, Menglong and Guo, Zhi and Fu, Kun},
	journal={IEEE Access},
	volume={6},
	pages={50839-50849},
	year={2018},
	publisher={IEEE}
}

@article{[yang2018r-dfpn](http://www.mdpi.com/2072-4292/10/1/132),
	title={Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks},
	author={Yang, Xue and Sun, Hao and Fu, Kun and Yang, Jirui and Sun, Xian and Yan, Menglong and Guo, Zhi},
	journal={Remote Sensing},
	volume={10},
	number={1},
	pages={132},
	year={2018},
	publisher={Multidisciplinary Digital Publishing Institute}
}

Rotational region detection based on Faster-RCNN.

Related tags

Overview

R2CNN_Faster_RCNN_Tensorflow

Abstract

DOTA test results

Comparison

Task1 - Oriented Leaderboard

Task2 - Horizontal Leaderboard

Face Detection

ICDAR2015

Requirements

Download Model

Data Prepare

Compile

Demo

DOTA

FDDB

Eval

Inference

Train

Tensorboard

Citation

Owner

UCAS-Det

CNN+Attention+Seq2Seq

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

A Python wrapper for the tesseract-ocr API

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

基于Paddle框架的PSENet复现

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

A simple demo program for using OpenCV on Android

A dataset handling library for computer vision datasets in LOST-fromat

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Face Recognizer using Opencv Python

Open Source Differentiable Computer Vision Library for PyTorch

A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications

chineseocr/table_line 表格线检测模型pytorch版

Detect textlines in document images

Use Youdao OCR API to covert your clipboard image to text.

This is used to convert a string to an Image with Handwritten Characters.

The code for “Oriented RepPoints for Aerail Object Detection”

docstrum

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation