An application of high resolution GANs to dewarp images of perturbed documents

Overview

Docuwarp

Codacy Badge Python version

This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translation. The objective is to take images of documents that are warped, folded, crumpled, etc. and convert the image to a "dewarped" state by using pix2pixHD to train and perform inference. All of the model code is borrowed directly from the pix2pixHD official repository.

Some of the intuition behind doing this is inspired by these two papers:

  1. DocUNet: Document Image Unwarping via A Stacked U-Net (Ma et.al)
  2. Document Image Dewarping using Deep Learning (Ramanna et.al)

May 8, 2020 : Important Update

  • This project does not contain a pre-trained model. I currently do not have the resources to train a model on an open source dataset, nor do I have the bandwidth at the moment to do so. If anyone would like to contribute a pretrained model and share their model checkpoints, feel free to do so, I will likely accept any PR trying to do this. Thanks!

Prerequisites

This project requires Python and the following Python libraries installed:

Getting Started

Installation

pip install dominate
  • Clone this repo:
git clone https://github.com/thomasjhuang/deep-learning-for-document-dewarping
cd deep-learning-for-document-dewarping

Training

  • Train the kaggle model with 256x256 crops:
python train.py --name kaggle --label_nc 0 --no_instance --no_flip --netG local --ngf 32 --fineSize 256
  • To view training results, please checkout intermediate results in ./checkpoints/kaggle/web/index.html. If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/kaggle/logs by adding --tf_log to the training scripts.

Training with your own dataset

  • If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity --label_nc N during both training and testing.
  • If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
  • If you don't have instance maps or don't want to use them, please specify --no_instance.
  • The default setting for preprocessing is scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

Testing

  • Test the model:
python test.py --name kaggle --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256

The test results will be saved to a directory here: ./results/kaggle/test_latest/.

Dataset

  • I use the kaggle denoising dirty documents dataset. To train a model on the full dataset, please download it from the official website. After downloading, please put it under the datasets folder with warped images under the directory name train_A and unwarped images under the directory train_B. Your test images are warped images, and should be under the name test_A. Below is an example dataset directory structure.

        .
        ├── ...
        ├── datasets                  
        │   ├── train_A               # warped images
        │   ├── train_B               # unwarped, "ground truth" images
        │   └── test_A                # warped images used for testing
        └── ...
    

Multi-GPU training

  • Train a model using multiple GPUs (bash ./scripts/train_kaggle_256_multigpu.sh):
#!./scripts/train_kaggle_256_multigpu.sh
python train.py --name kaggle_256_multigpu --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256 --batchSize 32 --gpu_ids 0,1,2,3,4,5,6,7

Training with Automatic Mixed Precision (AMP) for faster speed

  • To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex
  • You can then train the model by adding --fp16. For example,
#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16

In my test case, it trains about 80% faster with AMP on a Volta machine.

More Training/Test Details

  • Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.
  • Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag --no_instance.
Owner
Thomas Huang
I'm currently a Machine Learning Scientist @alectio. Purdue CS 2019
Thomas Huang
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 01, 2023
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
A simple QR-Code Reader in Python

A simple QR-Code Reader written in Python, that copies the content of a QR-Code directly into the copy clipboard.

Eric 1 Oct 28, 2021
Recognizing the text contents from a scanned visiting card

Recognizing the text contents from a scanned visiting card. The application which is used to recognize the text from scanned images,printeddocuments,r

Faizan Habib 1 Jan 28, 2022
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 30, 2022
Learning Camera Localization via Dense Scene Matching, CVPR2021

This repository contains code of our CVPR 2021 paper - "Learning Camera Localization via Dense Scene Matching" by Shitao Tang, Chengzhou Tang, Rui Hua

tangshitao 65 Dec 01, 2022
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 04, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

99 Dec 12, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 06, 2022
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022