TextBoxes++: A Single-Shot Oriented Scene Text Detector

Overview

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Introduction

This is an application for scene text detection (TextBoxes++) and recognition (CRNN).

TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of TextBoxes. CRNN is an open-source text recognizer. The code of TextBoxes++ is based on SSD and TextBoxes. The code of CRNN is modified from CRNN.

For more details, please refer to our arXiv paper.

Citing the related works

Please cite the related works in your publications if it helps your research:

@article{Liao2018Text,
  title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
  author = {Minghui Liao, Baoguang Shi and Xiang Bai},
  journal = {{IEEE} Transactions on Image Processing},
  doi  = {10.1109/TIP.2018.2825107},
  url = {https://doi.org/10.1109/TIP.2018.2825107},
  volume = {27},
  number = {8},
  pages = {3676--3690},
  year = {2018}
}

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} TPAMI},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Contents

  1. Requirements
  2. Installation
  3. Docker
  4. Models
  5. Demo
  6. Train

Requirements

NOTE There is partial support for a docker image. See docker/README.md. (Thank you for the PR from @mdbenito)

Torch7 for CRNN; 
g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0

Please refer to Caffe Installation to ensure other dependencies;

Installation

  1. compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
  1. compile CRNN (Please refer to CRNN if you have trouble with the compilation.)
cd crnn/src/
sh build_cpp.sh

Docker

(Thanks for the PR from @idotobi)

Build Docke Image

docker build -t tbpp_crnn:gpu .

This can take +1h, so go get a coffee ;)

Once this is done you can start a container via nvidia-docker.

nvidia-docker run -it --rm tbpp_crnn:gpu bash

To check if the GPU is available inside the docker container you can run nvidia-smi.

It's recommendable to mount the ./models and ./crnn/model/ directories to include the downloaded models.

nvidia-docker run -it \
                  --rm \
                  -v ${PWD}/models:/opt/caffe/models \ 
                  -v ${PWD}/crrn/model:/opt/caffe/crrn/model \
                  tbpp_crnn:gpu bash

For convenince this command is executed when running ./run.bash.

Models

  1. pre-trained model on SynthText (used for training): Dropbox; BaiduYun

  2. model trained on ICDAR 2015 Incidental Text (used for testing): Dropbox; BaiduYun

    Please place the above models in "./models/"

    If your data is hugely different from ICDAR 2015 Incidental Text,you'd better train it on your own data based on the pre-trained model on SynthText.

  3. CRNN model: Dropbox; BaiduYun

    Please place the crnn model in "./crnn/model/"

Demo

Download the ICDAR 2015 model and place it in "./models/"

python examples/text/demo.py

The detection results and recognition results are in "./demo_images"

Train

Create lmdb data

  1. convert ground truth into "xml" form: example.xml

  2. create train/test lists (train.txt / test.txt) in "./data/text/" with the following form:

     path_to_example1.jpg path_to_example1.xml
     path_to_example2.jpg path_to_example2.xml
    
  3. Run "./data/text/creat_data.sh"

Start training

1. modify the lmdb path in modelConfig.py
2. Run "python examples/text/train.py"
Owner
Minghui Liao
Minghui Liao, a Ph.D. student of Huazhong University of Science and Technology.
Minghui Liao
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Automatically resolve RidderMaster based on TensorFlow & OpenCV

AutoRiddleMaster Automatically resolve RidderMaster based on TensorFlow & OpenCV 基于 TensorFlow 和 OpenCV 实现的全自动化解御迷士小马谜题 Demo How to use Deploy the ser

神龙章轩 5 Nov 19, 2021
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
Smart computer vision application

Smart-computer-vision-application Backend : opencv and python Library required:

2 Jan 31, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 09, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Security Camera using Opencv & Dropbox This is a simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dro

Arpit Rath 1 Jan 31, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
OpenGait is a flexible and extensible gait recognition project

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

Shiqi Yu 335 Dec 22, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 218 Dec 31, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

Hanane Kheirandish 2 May 31, 2022
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023