PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

Overview

AttentionHTR

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks. Scene Text Recognition (STR) benchmark model [1], trained on synthetic scene text images, is used to perform transfer learning from the STR domain to HTR. Different fine-tuning approaches are investigated using the multi-writer datasets: Imgur5K [2] and IAM [3].

For more details, refer to our paper at arXiv: https://arxiv.org/abs/2201.09390

Dependencies

This work was tested with Python 3.6.8, PyTorch 1.9.0, CUDA 11.5 and CentOS Linux release 7.9.2009 (Core). Create a new virtual environment and install all the necessary Python packages:

python3 -m venv attentionhtr-env
source attentionhtr-env/bin/activate
pip install --upgrade pip
python3 -m pip install -r AttentionHTR/requirements.txt

Content

Our pre-trained models

Download our pre-trained models from here. The names of the .pth files are explained in the table below. There are 6 models in total, 3 for each character set, corresponding to the dataset they perform best on.

Character set Imgur5K IAM Both datasets
Case-insensitive AttentionHTR-Imgur5K.pth AttentionHTR-IAM.pth AttentionHTR-General.pth
Case-sensitive AttentionHTR-Imgur5K-sensitive.pth AttentionHTR-IAM-sensitive.pth AttentionHTR-General-sensitive.pth

Print the character sets using the Python string module: string.printable[:36] for the case-insensitive and string.printable[:-6] for the case-sensitive character set.

Pre-trained STR benchmark models can be downloaded from here.

Demo

  • Download the AttentionHTR-General-sensitive.pth model and place it into /model/saved_models.

  • Directory /dataset-demo contains demo images. Go to /model and create an LMDB dataset from them with python3 create_lmdb_dataset.py --inputPath ../dataset-demo/ --gtFile ../dataset-demo/gt.txt --outputPath result/dataset-demo/. Note that under Windows you may need to tune the map_size parameter manually for the lmdb.open() function.

  • Obtain predictions with python3 test.py --eval_data result/dataset-demo --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --saved_model saved_models/AttentionHTR-General-sensitive.pth --sensitive. The last two rows in the terminal should be

    Accuracy: 90.00000000
    Norm ED: 0.04000000
    
  • Inspect predictions in /model/result/AttentionHTR-General-sensitive.pth/log_predictions_dataset-demo.txt. Columns: batch number, ground truth string, predicted string, match (0/1), running accuracy.

Use the models for fine-tuning or predictions

Partitions

Prepare the train, validation (for fine-tuning) and test (for testing and for predicting on unseen data) partitions with word-level images. For the Imgur5K and the IAM datasets you may use our scripts in /process-datasets.

LMDB datasets

When using the PyTorch implementation of the STR benchmark model [1], images need to be converted into an LMDB dataset. See this section for details. An LMDB dataset offers extremely cheap read transactions [4]. Alternatively, see this demo that uses raw images.

Predictions and fine-tuning

The pre-trained models can be used for predictions or fine-tuning on additional datasets using an implementation in /model, which is a modified version of the official PyTorch implementation of the STR benchmark [1]. Use test.py for predictions and train.py for fine-tuning. In both cases use the following arguments:

  • --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn to define architecture.
  • --saved_model to provide a path to a pre-trained model. In case of train.py it will be used as a starting point in fine-tuning and in the case of test.py it will be used for predictions.
  • --sensitive for the case-sensitive character set. No such argument for the case-insensitive character set.

Specifically for fine-tuning use:

  • --FT to signal that model parameters must be initialized from a pre-trained model in --saved_model and not randomly.
  • --train_data and --valid_data to provide paths to training and validation data, respectively.
  • --select_data "/" and --batch_ratio 1 to use all data. Can be used to define stratified batches.
  • --manualSeed to assign an integer identifyer for the resulting model. The original purpose of this argument is to set a random seed.
  • --patience to set the number of epochs to wait for the validation loss to decrease below the last minimum.

Specifically for predicting use:

  • --eval_data to provide a path to evaluation data.

Note that test.py outputs its logs and a copy of the evaluated model into /result.

All other arguments are described inside the scripts. Original instructions for using the scripts in /model are available here.

For example, to fine-tune one of our case-sensitive models on an additional dataset:

CUDA_VISIBLE_DEVICES=3 python3 train.py \
--train_data my_train_data \
--valid_data my_val_data \
--select_data "/" \
--batch_ratio 1 \
--FT \
--manualSeed 1
--Transformation TPS \
--FeatureExtraction ResNet \
--SequenceModeling BiLSTM \
--Prediction Attn \
--saved_model saved_models/AttentionHTR-General-sensitive.pth \
--sensitive

To use the same model for predictions:

CUDA_VISIBLE_DEVICES=0 python3 test.py \
--eval_data my_unseen_data \
--Transformation TPS \
--FeatureExtraction ResNet \
--SequenceModeling BiLSTM \
--Prediction Attn \
--saved_model saved_models/AttentionHTR-General.pth \
--sensitive

Acknowledgements

  • Our implementation is based on Clova AI's deep text recognition benchmark.
  • The authors would like to thank Facebook Research for the Imgur5K dataset.
  • The computations were performed through resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE).

References

[1]: Baek, J. et al. (2019). What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4715-4723). https://arxiv.org/abs/1904.01906

[2]: Krishnan, P. et al. (2021). TextStyleBrush: Transfer of Text Aesthetics from a Single Example. arXiv preprint arXiv:2106.08385. https://arxiv.org/abs/2106.08385

[3]: Marti, U. V., & Bunke, H. (2002). The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1), 39-46. https://doi.org/10.1007/s100320200071

[4]: Lightning Memory-Mapped Database. Homepage: https://www.symas.com/lmdb

Citation

@article{kass2022attentionhtr,
  title={AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks},
  author={Kass, D. and Vats, E.},
  journal={arXiv preprint arXiv:2201.09390},
  year={2022}
}

Contact

Dmitrijs Kass ([email protected])

Ekta Vats ([email protected])

Owner
Dmitrijs Kass
Data Science student at Uppsala University
Dmitrijs Kass
A benchmark framework for Tensorflow

TensorFlow benchmarks This repository contains various TensorFlow benchmarks. Currently, it consists of two projects: PerfZero: A benchmark framework

1.1k Dec 30, 2022
EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

SCICAP: Scientific Figures Dataset This is the Github repo of the EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures (Hsu

Edward 26 Nov 21, 2022
Data visualization app for H&M competition in kaggle

handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an

Kyohei Uto 12 Apr 30, 2022
Luminaire is a python package that provides ML driven solutions for monitoring time series data.

A hands-off Anomaly Detection Library Table of contents What is Luminaire Quick Start Time Series Outlier Detection Workflow Anomaly Detection for Hig

Zillow 670 Jan 02, 2023
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper — Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
Auxiliary data to the CHIIR paper Searching to Learn with Instructional Scaffolding

Searching to Learn with Instructional Scaffolding This is the data and analysis code for the paper "Searching to Learn with Instructional Scaffolding"

Arthur Câmara 2 Mar 02, 2022
This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 2020

Classifier-Balancing This repository contains code for the paper: Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang, Sa

Facebook Research 820 Dec 26, 2022
Python Environment for Bayesian Learning

Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl in

Abhik Shah 103 Jul 14, 2022
UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network. paper: Unsupervised Image-to-Image Translation via Pr

208 Dec 30, 2022
Grow Function: Generate 3D Stacked Bifurcating Double Deep Cellular Automata based organisms which differentiate using a Genetic Algorithm...

Grow Function: A 3D Stacked Bifurcating Double Deep Cellular Automata which differentiates using a Genetic Algorithm... TLDR;High Def Trees that you can mint as NFTs on Solana

Nathaniel Gibson 4 Oct 08, 2022
A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

TCV-X21 validation for divertor turbulence simulations Quick links Intro Welcome to TCV-X21. We're glad you've found us! This repository is designed t

0 Dec 18, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 172 Dec 22, 2022
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 61 Oct 17, 2022
City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Code

City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Requirements Python 3.8 or later with all requirements.txt dependencies installed,

88 Dec 12, 2022
A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

HReynaud 5 Jan 03, 2023
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

RSPNet Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning" [Suppleme

35 Jun 24, 2022
A configurable, tunable, and reproducible library for CTR prediction

FuxiCTR This repo is the community dev version of the official release at huawei-noah/benchmark/FuxiCTR. Click-through rate (CTR) prediction is an cri

XUEPAI 397 Dec 30, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Implementation of paper DeepTag: A General Framework for Fiducial Marker Design and Detection. Project page: https://herohuyongtao.github.io/research/

Yongtao Hu 46 Dec 12, 2022