Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Related tags

Deep LearningLineHTR
Overview

Line-level Handwritten Text Recognition with TensorFlow

poster

This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and can handle a full line of text image. Huge thanks to @Harald Scheidl for his great works.

How to run

Go to the src/ directory and run python main.py with these following arguments

Command line arguments

  • --train: train the NN, details see below.
  • --validate: validate the NN, details see below.
  • --beamsearch: use vanilla beam search decoding (better, but slower) instead of best path decoding.
  • --wordbeamsearch: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.

I don't include any pretrained model in this branch so you will need to train the model on your data first

Train model

I created this model for the Cinnamon AI Marathon 2018 competition, they released a small dataset but it's in Vietnamese, so you guys may want to try some other dataset like [4]IAM for English.

As long as your dataset contain a labels.json file like this:

{
    "img1.jpg": "abc xyz",
    ...
    "imgn.jpg": "def ghi"
}

With eachkey is the path to the images file and each value is the ground truth label for that image, this code will works fine.

Learning is visualized by Tensorboard, I tracked the character error rate, word error rate and sentences accuracy for this model. All logs will be saved in ./logs/ folder. You can start a Tensorboard session to see the logs with this command tensorboard --logdir='./logs/'

It's took me about 48 hours with about 13k images on a single GTX 1060 6GB to get down to 0.16 CER on the private testset of the competition.

Information about model

Overview

The model is a extended version of the Simple HTR system @Harald Scheidl implemented It consists of 7 CNN layers, 2 RNN (Bi-LSTM) layers and the CTC loss and decoding layer and can handle a full line of text image

  • The input image is a gray-value image and has a size of 800x64
  • 7 CNN layers map the input image to a feature sequence of size 100x512
  • 2 LSTM layers with 512 units propagate information through the sequence and map the sequence to a matrix of size 100x205. Each matrix-element represents a score for one of the 205 characters at one of the 100 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)
  • Batch size is set to 50

Highest accuracy achieved is 0.84 on the private testset of the Cinnamon AI Marathon 2018 competition (measure by Charater Error Rate - CER).

Improve accuracy

If you need a better accuracy, here are some ideas how to improve it [2]:

  • Data augmentation: increase dataset-size by applying further (random) transformations to the input images. At the moment, only random distortions are performed.
  • Remove cursive writing style in the input images (see DeslantImg).
  • Increase input size.
  • Add more CNN layers or use transfer learning on CNN.
  • Replace Bi-LSTM by 2D-LSTM.
  • Replace optimizer: Adam improves the accuracy, however, the number of training epochs increases (see discussion).
  • Decoder: use token passing or word beam search decoding [3] (see CTCWordBeamSearch) to constrain the output to dictionary words.
  • Text correction: if the recognized word is not contained in a dictionary, search for the most similar one.

Btw, don't hesitate to ask me anything via a Github Issue (See the issue template file for more details)

BTW, big shout out to Sushant Gautam for extended this code for IAM dataset, he even provide pretrained model and web UI for inferences the model. Don't forget to check his repo out.

References

[1] Build a Handwritten Text Recognition System using TensorFlow

[2] Scheidl - Handwritten Text Recognition in Historical Documents

[3] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition

Owner
Hoàng Tùng Lâm (Linus)
AI Researcher/Engineer at Techainer
Hoàng Tùng Lâm (Linus)
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

49 Jan 03, 2023
[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

CoRe Created by Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu, Jie Zhou This is the PyTorch implementation for ICCV paper Group-aware Contrastive

Xumin Yu 31 Dec 24, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
PyContinual (An Easy and Extendible Framework for Continual Learning)

PyContinual (An Easy and Extendible Framework for Continual Learning) Easy to Use You can sumply change the baseline, backbone and task, and then read

Zixuan Ke 176 Jan 05, 2023
Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Semi-supervised Deep Kernel Learning This is the code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data

58 Oct 26, 2022
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a

Amazon 92 Jan 04, 2023
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Awesome production machine learning This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, versi

The Institute for Ethical Machine Learning 12.9k Jan 04, 2023
Title: Heart-Failure-Classification

This Notebook is based off an open source dataset available on where I have created models to classify patients who can potentially witness heart failure on the basis of various parameters. The best

Akarsh Singh 2 Sep 13, 2022
利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面 利用TensorRT加速推理速度是以时间换取精度的做法,意味着在推理速度上升的同时将会有精度的下降,不过不用太担心,精度下降微乎其微。此外,要有NVIDIA显卡,经测试,CUDA10.2可以支持20系列显卡及以下,30系列显卡需要CUDA11.x的支持,并且目前有bug。 默认你已经完成了

Helium 6 Jul 28, 2022
MDMM - Learning multi-domain multi-modality I2I translation

Multi-Domain Multi-Modality I2I translation Pytorch implementation of multi-modality I2I translation for multi-domains. The project is an extension to

Hsin-Ying Lee 107 Nov 04, 2022
Springer Link Download Module for Python

♞ pupalink A simple Python module to search and download books from SpringerLink. 🧪 This project is still in an early stage of development. Expect br

Pupa Corp. 18 Nov 21, 2022
Official Repo of my work for SREC Nandyal Machine Learning Bootcamp

About the Bootcamp A 3-day Machine Learning Bootcamp organised by Department of Electronics and Communication Engineering, Santhiram Engineering Colle

MS 1 Nov 29, 2021
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

477 Jan 06, 2023
git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

60 Dec 29, 2022
coldcuts is an R package to automatically generate and plot segmentation drawings in R

coldcuts coldcuts is an R package that allows you to draw and plot automatically segmentations from 3D voxel arrays. The name is inspired by one of It

2 Sep 03, 2022
Single object tracking and segmentation.

Single/Multiple Object Tracking and Segmentation Codes and comparison of recent single/multiple object tracking and segmentation. News 💥 AutoMatch is

ZP ZHANG 385 Jan 02, 2023
Run object detection model on the Raspberry Pi

Using TensorFlow Lite with Python is great for embedded devices based on Linux, such as Raspberry Pi.

Dimitri Yanovsky 6 Oct 08, 2022
All the code and files related to the MI-Lab of UE19CS305 course in sem 5

Machine-Intelligence-Lab-CS305 The compilation of all the code an drelated files from MI-Lab UE19CS305 (of batch 2019-2023) offered by PES University

Arvind Krishna 3 Nov 10, 2022
code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

Code for our ECCV (2020) paper A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation. Prerequisites: python == 3.6.8 pytorch ==1.1.0

32 Nov 27, 2022
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

NVIDIA Corporation 529 Jan 03, 2023