Handwritten_Text_Recognition

Overview

Deep Learning framework for Line-level Handwritten Text Recognition

Short presentation of our project

  1. Introduction

  2. Installation
    2.a Install conda environment
    2.b Download databases

    • IAM dataset
    • ICFHR 2014 dataset
  3. How to use
    3.a Make predictions on unlabelled data using our best networks
    3.b Train and test a network from scratch
    3.c Test a model without retraining it

  4. References

  5. Contact

1. Introduction

This work was an internship project under Mathieu Aubry's supervision, at the LIGM lab, located in Paris.

In HTR, the task is to predict a transcript from an image of a handwritten text. A commonly used structure for this task is Convolutional Recurrent Neural Networks (CRNN). One CRNN network consists of a feature extractor (often with convolutional layers), followed by a recurrent network (LSTM).

This github provides a framework to train and test CRNN networks on handwritten grayscale line-level datasets. This github also provides code to generate predictions on an unlabelled, line-level, grayscale line-level dataset. There are several options for the structure of the CRNN used, image preprocessing, dataset used, data augmentation.

alt text

2. Installation

Prerequisites

Make sure you have Anaconda installed (version >= to 4.7.10, you may not be able to install correct dependencies if older). If not, follow the installation instructions provided at https://docs.anaconda.com/anaconda/install/.

Also pull the git.

2.a Download and activate conda environment

Once in the git folder on your machine, run the command lines :

conda env create -f HTR_environment.yml
conda activate HTR 

2.b Download databases

You will only need to download these databases if you want to train your own network from scratch. The framework is built to train a network on one of these 2 datasets : IAM and ICFHR2014 HTR competition. [ADD REF TO SLIDES]

  • Before downloading IAM dataset, you need to register on this website. Once that's done, you need to download :

    • The 'lines' folder at this link.
    • The 'split' folder at this link.
    • The 'lines.txt' file at this link.
  • For ICFHR2014 dataset, you need to download the 'BenthamDatasetR0-GT' folder at this link.

Make sure to download the two databases in the same folder. Structure must be

Your data folder / 
    IAM/
        lines.txt
        lines/
        split/
            trainset.txt
            testset.txt
            validationset1.txt
            validationset2.txt
            
    ICFHR2014/
        BenthamDatasetR0-GT/ 

    Your own dataset/

3. How to use

3.a Make predictions on your own unlabelled dataset

Running this code will use model stored at model_path to make predictions on images stored in data_path. The predictions will be stored in predictions.txt in data_path folder.

python lines_predictor.py --data_path datapath  --model_path ./trained_networks/IAM_model_imgH64.pth --imgH 64

/!\ Make sure that each image in the data folder has a unique file name and all images are in .jpg form. When you use our trained model with imgH as 64 (i.e. IAM_model_imgH64.pth), you have to set the argument --imgH as 64.

3.b Train a network from scratch

python train.py --dataset dataset  --tr_data_path data_dir --save_model_path path

Before running the code, make sure that you change ROOT_PATH variable at the beginning of params.py to the path of the folder you want to save your models in. Main arguments :

  • --dataset: name of the dataset to train and test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: location of the train dataset folder on local machine. See section [??] for downloading datasets.
  • --save_model_path: path of the folder where model will be saved if params.save is set to True.

Main learning arguments :

  • --data_aug: If set to True, will apply random affine data transformation to the training images.

  • --optimizer: Which optimizer to use. Supported values are rmsprop, adam, adadelta, and sgd. We recommend using RMSprop, which got best results in our experiments. See params.py for optimizer-specific parameters.

  • --epochs : Number of training epochs

  • --lr: Learning rate at the beginning of training.

  • --milestones: List of the epochs at which the learning rate will be divided by 10.

  • feat_extractor: Structure to use for the feature extractor. Supported values are resnet18, custom_resnet, and conv.

    • resnet18 : standard structure of resnet18.
    • custom_resnet: variant of resnet18 that we tuned for our experiments.
    • conv: Use this option if you want to use a purely convolutional feature extractor and not a residual one. See conv parameters in params.py to choose conv structure.

3.c Test a model without retraining it

Running this code will compute the average CER and WER of model stored at pretrained_model path on the testing set of chosen dataset.

python train.py --train '' --save '' --pretrained_model model_path --dataset dataset --tr_data_path data_path 

Main arguments :

  • --pretrained_model: path to state_dict of pretrained model.
  • --dataset: Which dataset to test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: path to the dataset folder (see section [??])

4. References

Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
Sánchez et al. A set of benchmarks for Handwritten Text Recognition on historical documents
Dutta et al. Improving CNN-RNN Hybrid Networks for Handwriting Recognition

U.-V. Marti, H. Bunke The IAM-database: an English sentence database for offline handwriting recognition

https://github.com/Holmeyoung/crnn-pytorch
https://github.com/georgeretsi/HTR-ctc
Synthetic line generator : https://github.com/monniert/docExtractor (see paper for more information)

5. Contact

If you have questions or remarks about this project, please email us at [email protected] and [email protected].

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022
Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels" Please refer to htt

Ke Sun 1 Feb 14, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

M-LSD-warpPerspective-Example M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later tensorflow 2.4.1 or Later Usage 実行方法は以下です。 pytho

KazuhitoTakahashi 9 Oct 14, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Slice a single image into multiple pieces and create a dataset from them

OpenCV Image to Dataset Converter Slice a single image of Persian digits into mu

Meysam Parvizi 14 Dec 29, 2022
A list of hyperspectral image super-solution resources collected by Junjun Jiang

A list of hyperspectral image super-resolution resources collected by Junjun Jiang. If you find that important resources are not included, please feel free to contact me.

Junjun Jiang 301 Jan 05, 2023
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF

MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF Python class for converting (very fast) 3D Meshes/Surfaces to Raster DEMs

8 Sep 10, 2022
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want.

Virtual Keyboard With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want. At

Güldeniz Bektaş 5 Jan 23, 2022
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022