Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Last update: Jul 13, 2022

Related tags

Computer Vision u2netscan

Overview

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Toolset

U^2-Net is used for background removal
Textcleaner is used for image cleaning and line deskew (max 5 degrees)
Tesseract is used for text angle rotation
Deskew is used for line deskew (between 5 and 45 degrees)

Examples

Tested one document on smartphone camera with different angles

To build & deploy

Clone thee repo
Download the model: check app/saved_models/README.md
Build Docker image : docker build -t / : .
Test locally : Run Docker image and check if api is working by running http://localhost:10000
- CPU : docker run -it -v $PWD:/LOCAL/ -p 10000:80 / :
- GPU : docker run -it --gpus all -v $PWD:/LOCAL/ -p 10000:80 / :
Push docker image to Dockerhub (optional):
- Check: https://docs.docker.com/docker-hub/repos/ for account setup
- Create in Dockerhub Repo similar to the name of yout Image ID :
- Run docker push / :
Deploy to Cloud Run (optional):
- Create your google cloud account
- Push Docker Image to Google Container Registry
  - create new project called [PROJECT-ID]
  - Open Cloud shell in your Google account and run: docker pull / : docker tag [IMAGE] gcr.io/[PROJECT-ID]/[IMAGE] docker push gcr.io/[PROJECT-ID]/[IMAGE] more detail in this link
- Create CloudRun Service, and select Container that was created
  - Screenshot of the config - for demo purpose, it will be cost free
- Click Deploy, and test the Api Url that will display

Limits and Areas for improvements

Speed: It takes 7 to 10 seconds to process one image (serverless Cloud Run) With Gpu we can save 2 to 3 seconds (U^2-Net is 3 times faster)
Textcleaner is slow but works better on image cleaning, but needs some manual fine-tuning

References

U^2-Net https://github.com/xuebinqin/U-2-Net.git
Textcleaner http://www.fmwconcepts.com/imagemagick/textcleaner/
Tesseract https://github.com/tesseract-ocr/tesseract
Deskew https://github.com/sbrunner/deskew.git

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Related tags

Overview

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Toolset

Examples

To build & deploy

Limits and Areas for improvements

References

Owner

A Python wrapper for Google Tesseract

Document blur detection based on Laplacian operator and text detection.

Contextual speed detection for python

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

天池2021"全球人工智能技术创新大赛"【赛道一】：医学影像报告异常检测 - 第三名解决方案

Tesseract Open Source OCR Engine (main repository)

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

Fun program to overlay a mask to yourself using a webcam

A set of workflows for corpus building through OCR, post-correction and normalisation

Creating of virtual elements of the graphical interface using opencv and mediapipe.

🖺 OCR using tensorflow with attention

End-to-end pipeline for real-time scene text detection and recognition.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

A curated list of awesome synthetic data for text location and recognition

a Deep Learning Framework for Text

FastOCR is a desktop application for OCR API.

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Related tags

Overview

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Toolset

Examples

To build & deploy

Limits and Areas for improvements

References

Owner

A Python wrapper for Google Tesseract

Document blur detection based on Laplacian operator and text detection.

Contextual speed detection for python

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

天池2021"全球人工智能技术创新大赛"【赛道一】：医学影像报告异常检测 - 第三名解决方案

Tesseract Open Source OCR Engine (main repository)

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

Fun program to overlay a mask to yourself using a webcam

A set of workflows for corpus building through OCR, post-correction and normalisation

Creating of virtual elements of the graphical interface using opencv and mediapipe.

🖺 OCR using tensorflow with attention

End-to-end pipeline for real-time scene text detection and recognition.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

A curated list of awesome synthetic data for text location and recognition

a Deep Learning Framework for Text

FastOCR is a desktop application for OCR API.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約