Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Overview

Box_Discretization_Network

This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method [link], which won the championship.

PPT link [Google Drive][Baidu Cloud]

Generate your own JSON: [Google Drive][Baidu Cloud]

Brief introduction (in Chinese): [Google Drive][Baidu Cloud]

Competition related

Competition model and config files (it needs a lot of video memory):

  • Paper [Link] (Exploring the Capacity of Sequential-free Box Discretization Networkfor Omnidirectional Scene Text Detection)

  • Config file [BaiduYun Link]. Models below all use this config file except directory. Results below are the multi-scale ensemble results. The very details are described in our updated paper.

  • MLT 2017 Model [BaiduYun Link].

MLT 2017 Recall Precision Hmean
new 76.44 82.75 79.47
ReCTS Detection Recall Precision Hmean
new 93.97 92.76 93.36
HRSC_2016 Recall Precision Hmean TIoU-Hmean AP
IJCAI version 94.8 46.0 61.96 51.1 93.7
new 94.1 83.8 88.65 73.3 89.22
  • Online demo is updating (the old demo version used a wrong configuration). This demo uses the MLT model provided above. It can detect multi-lingual text but can only recognize English, Chinese, and most of the symbols.

Description

Please see our paper at [link].

The advantages:

  • BDN can directly produce compact quadrilateral detection box. (segmentation-based methods need additional steps to group pixels & such steps usually sensitive to outliers)
  • BDN can avoid label confusion (non-segmentation-based methods are mostly sensitive to label sequence, which can significantly undermine the detection result). Comparison on ICDAR 2015 dataset showing different methods’ ability of resistant to the label confusion issue (by adding rotated pseudo samples). Textboxes++, East, and CTD are all Sesitive-to-Label-Sequence methods.
Textboxes++ [code] East [code] CTD [code] Ours
Variances (Hmean) ↓ 9.7% ↓ 13.7% ↓ 24.6% ↑ 0.3%

Getting Started

A basic example for training and testing. This mini example offers a pure baseline that takes less than 4 hours (with 4 1080 ti) to finalize training with only official training data.

Install anaconda

Link:https://pan.baidu.com/s/1TGy6O3LBHGQFzC20yJo8tg psw:vggx

Step-by-step install

conda create --name mb
conda activate mb
conda install ipython
pip install ninja yacs cython matplotlib tqdm scipy shapely
conda install pytorch=1.0 torchvision=0.2 cudatoolkit=9.0 -c pytorch
conda install -c menpo opencv
export INSTALL_DIR=$PWD
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
cd $INSTALL_DIR
git clone https://github.com/Yuliang-Liu/Box_Discretization_Network.git
cd Box_Discretization_Network
python setup.py build develop
  • MUST USE torchvision=0.2

Pretrained model:

[Link] unzip under project_root

(This is ONLY an ImageNet Model With a few iterations on ic15 training data for a stable initialization)

ic15 data

Prepare data follow COCO format. [Link] unzip under datasets/

Train

After downloading data and pretrained model, run

bash quick_train_guide.sh

Test with [TIoU]

Run

bash my_test.sh

Put kes.json to ic15_TIoU_metric/ inside ic15_TIoU_metric/

Run (conda deactivate; pip install Polygon2)

python2 to_eval.py

Example results:

  • mask branch 79.4 (test segm.json by changing to_eval.py (line 10: mode=0) );
  • kes branch 80.4;
  • in .yaml, set RESCORING=True -> 80.8;
  • Set RESCORING=True and RESCORING_GAMA=0.8 -> 81.0;
  • One can try many other tricks such as CROP_PROB_TRAIN, ROTATE_PROB_TRAIN, USE_DEFORMABLE, DEFORMABLE_PSROIPOOLING, PNMS, MSR, PAN in the project, whcih were all tested effective to improve the results. To achieve state-of-the-art performance, extra data (syntext, MLT, etc.) and proper training strategies are necessary.

Visualization

Run

bash single_image_demo.sh

Citation

If you find our method useful for your reserach, please cite

@article{liu2019omnidirectional,
  title={Omnidirectional Scene Text Detection with Sequential-free Box Discretization},
  author={Liu, Yuliang and Zhang, Sheng and Jin, Lianwen and Xie, Lele and Wu, Yaqiang and Wang, Zhepeng},
  journal={IJCAI},
  year={2019}
}
@article{liu2019exploring,
  title={Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection},
  author={Liu, Yuliang and He, Tong and Chen, Hao and Wang, Xinyu and Luo, Canjie and Zhang, Shuaitao and Shen, Chunhua and Jin, Lianwen},
  journal={arXiv preprint arXiv:1912.09629},
  year={2019}
}

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to [email protected] or [email protected]. For commercial usage, please contact Prof. Lianwen Jin via [email protected].

Owner
Yuliang Liu
MMLab; South China University of Technology; University of Adelaide
Yuliang Liu
Deep Distributed Control of Port-Hamiltonian Systems

De(e)pendable Distributed Control of Port-Hamiltonian Systems (DeepDisCoPH) This repository is associated to the paper [1] and it contains: The full p

Dependable Control and Decision group - EPFL 3 Aug 17, 2022
Repository of continual learning papers

Continual learning paper repository This repository contains an incomplete (but dynamically updated) list of papers exploring continual learning in ma

29 Jan 05, 2023
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
Citation Intent Classification in scientific papers using the Scicite dataset an Pytorch

Citation Intent Classification Table of Contents About the Project Built With Installation Usage Acknowledgments About The Project Citation Intent Cla

Federico Nocentini 4 Mar 04, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022
This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans. TABS relies on a Res-Unet backbone, with a Vision

6 Nov 07, 2022
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

52 Nov 21, 2022
A medical imaging framework for Pytorch

Welcome to MedicalTorch MedicalTorch is an open-source framework for PyTorch, implementing an extensive set of loaders, pre-processors and datasets fo

Christian S. Perone 799 Jan 03, 2023
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
Framework for abstracting Amiga debuggers and access to AmigaOS libraries and devices.

Framework for abstracting Amiga debuggers. This project provides abstration to control an Amiga remotely using a debugger. The APIs are not yet stable

Roc Vallès 39 Nov 22, 2022
AI Summer's complete catalog of articles

Learn Deep Learning with AI Summer A collection of all articles (almost 100) written for the AI Summer blog organized by topic. Deep Learning Theory M

AI Summer 95 Dec 29, 2022
Time Dependent DFT in Tamm-Dancoff Approximation

Density Function Theory Program - kspy-tddft(tda) This is an implementation of Time-Dependent Density Functional Theory(TDDFT) using the Tamm-Dancoff

Peter Borthwick 2 Nov 17, 2022
MODNet: Trimap-Free Portrait Matting in Real Time

MODNet is a model for real-time portrait matting with only RGB image input.

Zhanghan Ke 2.8k Dec 30, 2022
A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Spiking Neural Network training with EventProp This is an unofficial PyTorch implemenation of EventProp, a method to compute exact gradients for Spiki

Pedro Savarese 35 Jul 29, 2022