CNN+LSTM+CTC based OCR implemented using tensorflow.

Last update: Dec 08, 2022

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset， the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

Python 3.6.4
TensorFlow 1.2
Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.

# make sure the data path is correct, have a look at helper.py.

python helper.py

Run following How to run

CNN+LSTM+CTC based OCR implemented using tensorflow.

Related tags

Overview

CNN_LSTM_CTC_Tensorflow

Structure

Prerequisite

How to run

Run with your own data.

Owner

Watson Yang

OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

Python rubik's cube solver

This tool will help you convert your text to handwriting xD

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

scene-linear test images

Generates a message from the infamous Jerma Impostor image

Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Deep Learning Chinese Word Segment

Extract tables from scanned image PDFs using Optical Character Recognition.

This Repository contain Opencv Projects in python

Image Recognition Model Generator

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

Run tesseract with the tesserocr bindings with @OCR-D's interfaces

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Virtual Zoom Gesture using OpenCV

Learning Camera Localization via Dense Scene Matching, CVPR2021