This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

Last update: Nov 17, 2022

Related tags

Overview

MASTER-mmocr

About The Project
- Dependency
Getting Started
- Prerequisites
- Installation
Usage
Result
Coming Soon
License
Citations
Acknowledgements

About The Project

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR，which is an open-source toolbox based on PyTorch. The overall architecture will be shown below.

Dependency

Getting Started

Prerequisites

Use Synthetic image datasets: SynthText (Synth800k), MJSynth (Synth90k) for training.
Real image datasets: IIIT5K, SVT, IC03, IC13, IC15, SVTP, CUTE80 for testing.

Dataset download link.
Change dataset path in MASTER config.

Installation

Install mmdetection. click here for details.

# We embed mmdetection-2.11.0 source code into this project.
# You can cd and install it (recommend).
cd ./mmdetection-2.11.0
pip install -v -e .

Install mmocr. click here for details.

# install mmocr
cd ./MASTER_mmocr
pip install -v -e .

Install mmcv-full-1.3.4. click here for details.

pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html

# install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

The usage of this project, is consistent with MMOCR-0.2.0. You can click here for mmocr usage details.

For training, run command

CUDA_VISIBLE_DEVICES={device_id} PORT={port_number} ./tools/dist_train.sh {config_path} {work_dir} {gpu_number}

# example
CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_train.sh ./configs/textrecog/master/master_ResnetExtra_academic_dataset_dynamic_mmfp16.py /expr/mmocr_text_line_recognition/ 1

PS :

As mentioned in Prerequisites part, we use synthetic image datasets for training and real image datasets for evalutating. The 7 real image datasets mentioned above will be evaluated at each evaluation interval.

Result

Dataset	Paper reported accuracy	Our accuracy
IIIT5K	95.0	95.07
SVT	90.6	90.42
IC03	96.4	95.58
IC13	95.3	96.03
IC15	79.4	80.95
SVTP	84.5	84.34
CUTE80	87.5	90.62

Coming Soon

1st Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

If you find MASTER useful please cite paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

Related tags

Overview

MASTER-mmocr

Contents

About The Project

Dependency

Getting Started

Prerequisites

Installation

Usage

Result

Coming Soon

License

Citations

Acknowledgements

Owner

Jianquan Ye

Efficient Speech Processing Tookit for Automatic Speaker Recognition

JupyterLite demo deployed to GitHub Pages 🚀

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.

Cockpit is a visual and statistical debugger specifically designed for deep learning.

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Byzantine-robust decentralized learning via self-centered clipping

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

A code generator from ONNX to PyTorch code

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required.

Classical OCR DCNN reproduction based on PaddlePaddle framework.

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

exponential adaptive pooling for PyTorch

This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.