Document Image Dewarping

Last update: Dec 23, 2022

Overview

Document image dewarping using text-lines and line Segments

Abstract

Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, we encode this properties into the cost function in addition to the text-line based cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve (extrinsic parameters) and camera focal length (intrinsic parameter), which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well horizontal/vertical aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. Moreover, the proposed method can extend to general curves surfaces as well as document.

Algorithm

Two line semgent properties

Straightness property

The straightness property describes the line segments extracted in curved document image, lines on the curved document surface become still straight in the well-rectified domain (Although the lines extracted in the well-rectified image can be curved in the curved document surface). It means that line-to-line mapping. Since the straightness property is always satisfied with all plane to plane mapping, it is not a significant constraint in rectification considering only camera view (such as homography). However we consider page curve as well as camera view in rectification process, then this property becomes an efficient constraint that prevents lines from being curved.

Alignment property

Based on the observation that the majority of line segments are horizontally or vertically aligned in the rectified images.

Outlier removal

The direct optimization of may yield poorly rectified results, due to outliers. We treat two outlier types that are missed text-lines and line segments having arbitrary direction (non horizontal/vertical). For the outlier removal, we design an iterative method. At each step, we refine the features (text components and line segments) by removing outlier (that are not well aligned) and minimize the cost function with updated inliers.

Experimental results

CBDAR 2007 dataset

We evaluate our method on the CBDAR 2007 dewarpint contest dataset [http://staffhome.ecm.uwa.edu.au/~00082689/downloads.html], that is consisted of binarized text images.

Input image	Kim [2]	Proposed

Our document image dataset

In order to consist of non conventional document images (i.e., not text-abundant cases), we collected 100 images having various layouts (e.g., three column documents, documents containing large tables and/or figures, presentation slides, and so on).

Input image	Kim [2]	Proposed

Our curved image dataset

In order to consist of general curved surface images (such as bottles), we collected 74 images.

Input image	Kim [2]	Proposed

Executable program

Executable program can be downloaded by below links:

http://ispl.synology.me:8480/sharing/uA2DTRA8U

Reference

[1] Taeho Kil, Wonkyo Seo, Hyung Il Koo and Nam Ik Cho, "Robust Document Image Dewarping Using Text-Line and Line Segments", ICDAR 2017.

[2] Beom Su Kim, Hyung Il Koo, and Nam Ik Cho, "Document Dewarping via Text-line based Optimization", Pattern Recognition 2015.

Document Image Dewarping

Related tags

Overview

Document image dewarping using text-lines and line Segments

Abstract

Algorithm

Two line semgent properties

Straightness property

Alignment property

Outlier removal

Experimental results

CBDAR 2007 dataset

Our document image dataset

Our curved image dataset

Executable program

Reference

Owner

Taeho Kil

Natural language detection

OCR powered screen-capture tool to capture information instead of images

Zoom , GoogleMeets에서 Vtuber 데뷔하기

A real-time dolly zoom camera effect

A selectional auto-encoder approach for document image binarization

Deskewing images with slanted content

Text-to-Image generation

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

Semantic-based Patch Detection for Binary Programs

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

graph learning code for ogb

TextBoxes re-implement using tensorflow

A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

SRA's seminar on Introduction to Computer Vision Fundamentals

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

Autonomous Driving project for Euro Truck Simulator 2

A fastai/PyTorch package for unpaired image-to-image translation.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Document Image Dewarping

Related tags

Overview

Document image dewarping using text-lines and line Segments

Abstract

Algorithm

Two line semgent properties

Straightness property

Alignment property

Outlier removal

Experimental results

CBDAR 2007 dataset

Our document image dataset

Our curved image dataset

Executable program

Reference

Owner

Taeho Kil

Natural language detection

OCR powered screen-capture tool to capture information instead of images

Zoom , GoogleMeets에서 Vtuber 데뷔하기

A real-time dolly zoom camera effect

A selectional auto-encoder approach for document image binarization

Deskewing images with slanted content

Text-to-Image generation

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

Semantic-based Patch Detection for Binary Programs

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

graph learning code for ogb

TextBoxes re-implement using tensorflow

A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

SRA's seminar on Introduction to Computer Vision Fundamentals

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

Autonomous Driving project for Euro Truck Simulator 2

A fastai/PyTorch package for unpaired image-to-image translation.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約