text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

Overview

text recognition toolbox

1. 项目介绍

该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。

模型 论文标题 发表年份 模型方法划分
CRNN 《An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition》 2017 CNN+BiLSTM+CTC
GRCNN 《Gated recurrent convolution neural network for OCR》 2017 Gated Recurrent Convulution Layer + BiSTM + CTC
FAN 《Focusing attention: Towards accurate text recognition in natural images》 2017 focusing network+1D attention
SAR 《Show, attend and read: A simple and strong baseline for irregular text recognition》 2019 ResNet+2D attention
DAN 《Decoupled attention network for text recognition》 2020 FCN+convolutional alignment module
SATRN 《On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention》 2020 Transformer

2. 如何使用

2.1 环境要求

torch==1.3.0
numpy==1.17.3
lmdb==0.98
opencv-python==3.4.5.20

2.2 训练

  • 数据准备

首先需要准备训练数据,目前只支持lmdb格式的数据,数据转换的步骤如下:

  1. 准备图片数据集,图片是根据检测框进行切分后的数据
  2. 准备label.txt,标注文件需保持如下的格式
1.jpg 文字检测
2.jpg 文字识别
  1. 进行lmdb格式数据集的转换
python3 tools/create_lmdb_dataset.py --inputPath {图片数据集路径} --gtFile {标注文件路径} --outputPath {lmdb格式数据集保存路径}
  • 配置文件

目前每个模型都单独配备了一个配置文件,这里以CRNN为例, 配置文件主要参数的含义如下:

一级参数 二级参数 参数含义 备注
TrainReader dataloader 自定义的DataLoader类
select_data 选择使用的lmdb格式数据集 默认为'/',即使用{lmdb_sets_dir}路径下所有的lmdb数据集。如果想控制同一个batch里不同数据集的比例,可以配合{batch_ratio}使用,并将数据集名称用'-'进行分割,例如设置成'数据集1-数据集2-数据集3'
batch_ratio 控制在一个batch中,各个lmdb格式数据集的比例 配合{select_data}进行使用,将比例用'-'进行分割,例如设置成'0.3-0.3-0.4'。即数据集1使用batch_size * 0.3的比例,剩余的数据集以此类推。
total_data_usage_ratio 控制使用的整体数据集比例 默认为1.0,即使用全部的数据集
padding 是否对数据进行padding补齐 默认为True,设置为False即采用resize的方式
Global highest_acc_save_type 是否只保存识别率最高的模型 默认为False
resumed_optimizer 是否加载之前保存的optimizer 默认为False
batch_max_length 最大的字符串长度 超过这个字符串长度的训练数据会被过滤掉
eval_batch_step 保存模型的间隔步数
Architecture function 使用的模型 此处为'CRNN'
SeqRNN input_size LSTM输入的尺寸 即backbone输出的通道个数
hidden_size LSTM隐藏层的尺寸
  • 模型训练

完成上述配置后,使用以下命令即可开始模型的训练:

python train.py -c configs/CRNN.yml

2.3 预测

  • 配置文件

同样地,针对模型预测,也都单独配备了一个配置文件,这里以CRNN为例, 需要修改的配置参数如下:

一级参数 二级参数 参数含义 备注
Global pretrain_weights 模型文件路径 剩余配置参数和训练保持一致即可
infer_img 待预测的图片,可以是文件夹或者是图片路径
  • 模型预测

完成上述配置后,使用以下命令即可开始模型的预测:

python predict.py -c configs/CRNN.yml

3. 预训练模型

以下是5个开源的中文自然场景数据集,可以直接根据上述的模型配置进行模型训练:

数据集 网盘地址 备注
一共包括5个自然场景训练集:
ArT_train, LSVT_train, MTWI_train, RCTW17_train, ReCTS_train
以及一个自然场景验证集:ReCTS_val
链接: https://pan.baidu.com/s/1fvExHzeojA_Yhj3_wDflwA
提取码: kzrd
"train"是训练集,"val"是验证集

以下为5个算法的预训练模型,训练的明细请见第4部分里的实验设定:

模型 网盘地址 备注
一共包含5个预训练模型:CRNN.pth, GRCNN.pth, FAN.pth, DAN.pth, SAR.pth
以及一个字典文件:keys.txt
链接: https://pan.baidu.com/s/1IG-1lxytrOqry9c5Nc1GzQ
提取码: k3ij

4. 实验结果

针对目前已复现的5个算法,我用统一的数据集以及参数设定进行了实验对比,实验设定以及实验结果如下:

  • 实验设定
实验设定 明细 备注
训练集 ArT_train:44663
LSVT_train:218552
MTWI_train:79964
RCTW17_train:33342
ReCTS_train:83119
这5个均为开源自然场景数据集,其中做了剔除模糊数据等处理
验证集 ReCTS_val:9231 测试集为从ReCTS中按照9:1比例划分的验证集,注意ReCTS以水平文本居多
batch_size 128
img_shape [1, 32, 256] 尺寸进行等比例放缩,小于256的进行padding,大于256的resize至256
optimizer function: adam
base_lr: 0.001
momentum: 0.9
weight_decay: 1.0e-4
iter 60000 一共训练了60000步,每2000步会进行一次验证
  • 实验结果
算法 最高识别率 最大正则编辑距离 模型大小
CRNN 59.89 0.7959 120M
GRCNN 70.51 0.8597 78M
FAN 75.78 0.8924 764M
SAR 78.13 0.9037 722M
DAN 78.99 0.9064 639M

下图为各个算法在验证集上的识别率,每2000步会进行验证:

fig1

  • 预测结果示例
算法 预测结果 备注
CRNN image-20210121152011971 预测结果均取自验证集识别率最高的模型,
左边一列为预测结果,右边为标注结果
GRCNN image-20210121152134249
FAN image-20210121152239497
SAR image-20210121152325124
DAN image-20210121152407344
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022
Implementation of ToeplitzLDA for spatiotemporal stationary time series data.

Code for the ToeplitzLDA classifier proposed in here. The classifier conforms sklearn and can be used as a drop-in replacement for other LDA classifiers. For in-depth usage refer to the learning from

Jan Sosulski 5 Nov 07, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 06, 2023
Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

RHGN Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling Dependencies torch==1.6.0 torchvision==0.7.0 dgl==0.7.1

Big Data and Multi-modal Computing Group, CRIPAC 6 Nov 29, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 07, 2022
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys

Nguyễn Thị Thanh Kim 5 Nov 13, 2022
Out-of-distribution detection using the pNML regret. NeurIPS2021

OOD Detection Load conda environment conda env create -f environment.yml or install requirements: while read requirement; do conda install --yes $requ

Koby Bibas 23 Dec 02, 2022
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

AttentionGAN-v2 for Unpaired Image-to-Image Translation AttentionGAN-v2 Framework The proposed generator learns both foreground and background attenti

Hao Tang 530 Dec 27, 2022
Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

Yijun Li 642 Dec 27, 2022
Manim is an engine for precise programmatic animations, designed for creating explanatory math videos

Manim is an engine for precise programmatic animations, designed for creating explanatory math videos. Note, there are two versions of manim. This rep

Grant Sanderson 49k Jan 09, 2023
Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

ChrisLiu 5 Oct 18, 2022
An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Sketch Simulator An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics. See

12 Dec 18, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 04, 2022
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

English | 简体中文 Documentation: https://mmtracking.readthedocs.io/ Introduction MMTracking is an open source video perception toolbox based on PyTorch.

OpenMMLab 2.7k Jan 08, 2023
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 06, 2023
Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

Christoph Reich 23 Sep 21, 2022
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Neural Circuit Policies Enabling Auditable Autonomy Online access via SharedIt Neural Circuit Policies (NCPs) are designed sparse recurrent neural net

8 Jan 07, 2023
Contrastive Feature Loss for Image Prediction

Contrastive Feature Loss for Image Prediction We provide a PyTorch implementation of our contrastive feature loss presented in: Contrastive Feature Lo

Alex Andonian 44 Oct 05, 2022