Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Related tags

Deep LearningLineHTR
Overview

Line-level Handwritten Text Recognition with TensorFlow

poster

This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and can handle a full line of text image. Huge thanks to @Harald Scheidl for his great works.

How to run

Go to the src/ directory and run python main.py with these following arguments

Command line arguments

  • --train: train the NN, details see below.
  • --validate: validate the NN, details see below.
  • --beamsearch: use vanilla beam search decoding (better, but slower) instead of best path decoding.
  • --wordbeamsearch: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.

I don't include any pretrained model in this branch so you will need to train the model on your data first

Train model

I created this model for the Cinnamon AI Marathon 2018 competition, they released a small dataset but it's in Vietnamese, so you guys may want to try some other dataset like [4]IAM for English.

As long as your dataset contain a labels.json file like this:

{
    "img1.jpg": "abc xyz",
    ...
    "imgn.jpg": "def ghi"
}

With eachkey is the path to the images file and each value is the ground truth label for that image, this code will works fine.

Learning is visualized by Tensorboard, I tracked the character error rate, word error rate and sentences accuracy for this model. All logs will be saved in ./logs/ folder. You can start a Tensorboard session to see the logs with this command tensorboard --logdir='./logs/'

It's took me about 48 hours with about 13k images on a single GTX 1060 6GB to get down to 0.16 CER on the private testset of the competition.

Information about model

Overview

The model is a extended version of the Simple HTR system @Harald Scheidl implemented It consists of 7 CNN layers, 2 RNN (Bi-LSTM) layers and the CTC loss and decoding layer and can handle a full line of text image

  • The input image is a gray-value image and has a size of 800x64
  • 7 CNN layers map the input image to a feature sequence of size 100x512
  • 2 LSTM layers with 512 units propagate information through the sequence and map the sequence to a matrix of size 100x205. Each matrix-element represents a score for one of the 205 characters at one of the 100 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)
  • Batch size is set to 50

Highest accuracy achieved is 0.84 on the private testset of the Cinnamon AI Marathon 2018 competition (measure by Charater Error Rate - CER).

Improve accuracy

If you need a better accuracy, here are some ideas how to improve it [2]:

  • Data augmentation: increase dataset-size by applying further (random) transformations to the input images. At the moment, only random distortions are performed.
  • Remove cursive writing style in the input images (see DeslantImg).
  • Increase input size.
  • Add more CNN layers or use transfer learning on CNN.
  • Replace Bi-LSTM by 2D-LSTM.
  • Replace optimizer: Adam improves the accuracy, however, the number of training epochs increases (see discussion).
  • Decoder: use token passing or word beam search decoding [3] (see CTCWordBeamSearch) to constrain the output to dictionary words.
  • Text correction: if the recognized word is not contained in a dictionary, search for the most similar one.

Btw, don't hesitate to ask me anything via a Github Issue (See the issue template file for more details)

BTW, big shout out to Sushant Gautam for extended this code for IAM dataset, he even provide pretrained model and web UI for inferences the model. Don't forget to check his repo out.

References

[1] Build a Handwritten Text Recognition System using TensorFlow

[2] Scheidl - Handwritten Text Recognition in Historical Documents

[3] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition

Owner
Hoàng Tùng Lâm (Linus)
AI Researcher/Engineer at Techainer
Hoàng Tùng Lâm (Linus)
Live Hand Tracking Using Python

Live-Hand-Tracking-Using-Python Project Description: In this project, we will be

Hassan Shahzad 2 Jan 06, 2022
Using VapourSynth with super resolution models and speeding them up with TensorRT.

VSGAN-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Using NVIDIA/Torch-TensorRT combined wi

111 Jan 05, 2023
B-cos Networks: Attention is All we Need for Interpretability

Convolutional Dynamic Alignment Networks for Interpretable Classifications M. Böhle, M. Fritz, B. Schiele. B-cos Networks: Alignment is All we Need fo

58 Dec 23, 2022
Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

Yi-Lun Liao 62 Oct 25, 2022
MoCap-Solver: A Neural Solver for Optical Motion Capture Data

MoCap-Solver is a data-driven-based robust marker denoising method, which takes raw mocap markers as input and outputs corresponding clean markers and skeleton motions.

55 Dec 28, 2022
Learning Saliency Propagation for Semi-supervised Instance Segmentation

Learning Saliency Propagation for Semi-supervised Instance Segmentation PyTorch Implementation This repository contains: the PyTorch implementation of

Berkeley DeepDrive 68 Oct 18, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
Semi-supervised Domain Adaptation via Minimax Entropy

Semi-supervised Domain Adaptation via Minimax Entropy (ICCV 2019) Install pip install -r requirements.txt The code is written for Pytorch 0.4.0, but s

Vision and Learning Group 243 Jan 09, 2023
Aalto-cs-msc-theses - Listing of M.Sc. Theses of the Department of Computer Science at Aalto University

Aalto-CS-MSc-Theses Listing of M.Sc. Theses of the Department of Computer Scienc

Jorma Laaksonen 3 Jan 27, 2022
Conditional Generative Adversarial Networks (CGAN) for Mobility Data Fusion

This code implements the paper, Kim et al. (2021). Imputing Qualitative Attributes for Trip Chains Extracted from Smart Card Data Using a Conditional Generative Adversarial Network. Transportation Re

Eui-Jin Kim 2 Feb 03, 2022
Learning Continuous Signed Distance Functions for Shape Representation

DeepSDF This is an implementation of the CVPR '19 paper "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" by Park et a

Meta Research 1.1k Jan 01, 2023
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 08, 2023
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models (Paper) (Slides) (Video) RuleBERT is a pre-trained language model that has been fine-tune

16 Aug 24, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
Relative Positional Encoding for Transformers with Linear Complexity

Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin

Antoine Liutkus 48 Nov 16, 2022
Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Contents AnonyGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowledgments Citat

Nicola Dall'Asen 10 May 24, 2022
The missing CMake project initializer

cmake-init - The missing CMake project initializer Opinionated CMake project initializer to generate CMake projects that are FetchContent ready, separ

1k Jan 01, 2023
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Gabriele Corso 56 Dec 23, 2022
Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 1.5k Dec 29, 2022