Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Overview

Deskew

by Marek Mauder
https://galfar.vevb.net/deskew
https://github.com/galfar/deskew

v1.30 2019-06-07

Overview

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

There are binaries built for these platforms (located in Bin folder): Win64 (deskew.exe), Win32 (deskew32.exe), Linux 64bit (deskew), macOS (deskew-mac), Linux ARMv7 (deskew-arm).

GUI frontend for this CLI tool is available as well (Windows, Linux, and macOS).

License: MIT

Downloads And Releases

https://github.com/galfar/deskew/releases
https://galfar.vevb.net/deskew#downloads

Usage

Usage:
deskew [-o output] [-a angle] [-b color] [..] input
    input:         Input image file
  Options:
    -o output:     Output image file (default: out.png)
    -a angle:      Maximal expected skew angle (both directions) in degrees (default: 10)
    -b color:      Background color in hex format RRGGBB|LL|AARRGGBB (default: black)
  Ext. options:
    -q filter:     Resampling filter used for rotations (default: linear,
                   values: nearest|linear|cubic|lanczos)
    -t a|treshold: Auto threshold or value in 0..255 (default: a)
    -r rect:       Skew detection only in content rectangle (pixels):
                   left,top,right,bottom (default: whole page)
    -f format:     Force output pixel format (values: b1|g8|rgb24|rgba32)
    -l angle:      Skip deskewing step if skew angle is smaller (default: 0.01)
    -g flags:      Operational flags (any combination of):
                   c - auto crop, d - detect only (no output to file)
    -s info:       Info dump (any combination of):
                   s - skew detection stats, p - program parameters, t - timings
    -c specs:      Output compression specs for some file formats. Several specs
                   can be defined - delimited by commas. Supported specs:
                   jXX - JPEG compression quality, XX is in range [1,100(best)]
                   tSCHEME - TIFF compression scheme: none|lzw|rle|deflate|jpeg|g4

  Supported file formats
    Input:  BMP, JPG, PNG, JNG, GIF, DDS, TGA, PBM, PGM, PPM, PAM, PFM, TIF, PSD
    Output: BMP, JPG, PNG, JNG, GIF, DDS, TGA, PGM, PPM, PAM, PFM, TIF, PSD

Notes

For TIFF support in Linux and macOS you need to have libtiff 4.x installed (package is usually called libtiff5).

For macOS you can download prebuilt libtiff binaries here: https://galfar.github.io/store/TiffLibBins-macOS.zip. Just put the files inside the archive to the same folder as deskew-mac executable.

You can find some test images in TestImages folder and scripts to run tests (RunTests.bat and runtests.sh) in Bin. By default scripts just call deskew command but you can pass a different one as a parameter (e.g. runtests.sh deskew-arm).

Bugs, Issues, Proposals

File them here:
https://github.com/galfar/deskew/issues

Version History

v1.30 2019-06-07:

  • fix #15: Better image quality after rotation - better default and also selectable nearest|linear|cubic|lanczos filtering
  • fix #5: Detect skew angle only (no rotation done) - optionally only skew detection
  • fix #17: Optional auto-crop after rotation
  • fix #3: Command line option to set output compression - now for TIFF and JPEG
  • fix #12: Bad behavior when an output is given and no deskewing is needed
  • libtiff in macOS is now picked up also when binaries are put directly in the directory with deskew
  • text output is flushed after every write (Linux/Unix): it used to be flushed only when writing to device but not file/pipe.

v1.25 2018-05-19:

  • fix #6: Preserve DPI measurement system (TIFF)
  • fix #4: Output image not saved in requested format (when deskewing is skipped)
  • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed

v1.21 2017-11-01:

  • fix #8: Cannot compile in Free Pascal 3.0+ (Windows) - Fails to link precompiled LibTiff library
  • fix #7: Windows FPC build fails with Access violation exception when loading certain TIFFs (especially those saved by Windows Photo Viewer etc.)

v1.20 2016-09-01:

  • much faster rotation, especially when background color is set (>2x faster, 2x less memory)
  • can skip deskewing step if detected skew angle is lower than parameter
  • new option for timing of individual steps
  • fix: crash when last row of page is classified as text
  • misc: default back color is now opaque black, new forced output format "rgb24", background color can define also alpha channel, nicer formatting of text output

v1.10 2014-03-04:

  • TIFF support for Win64 and 32/64bit Linux
  • forced output formats
  • fix: output file names were always lowercase
  • fix: preserves resolution metadata (e.g. 300dpi) of input when writing output

v1.00 2012-06-04:

  • background color
  • "area of interest" content rectangle
  • 64bit and Mac OSX support
  • PSD and TIFF (win32) support
  • show skew detection stats and program parameters

v0.95 2010-12-28:

  • Added auto thresholding

v0.90 2010-02-12:

  • Initial version

Compiling Deskew

Deskew is written in Object Pascal. You need Free Pascal or Delphi to recompile it.

Tested Compilers

There are project files for these IDEs:

  1. Lazarus 2.0.10 (deskew.lpi)
  2. Delphi XE + 10.3 (deskew.dproj)

Additionally, there are compile shell/batch scripts for standalone FPC compiler in Scripts folder.

Supported/Tested Platforms

Deskew is precompiled and was tested on these platforms: Win32, Win64, Linux 64bit, macOS 64bit, Linux ARMv7

Source Code

Latest source code can be found here:
https://github.com/galfar/deskew

Dependencies

Vampyre Imaging Library is needed for compilation and it's included in Deskew's repo in Imaging folder.

Comments
  • Detect skew angle only (no rotation done)

    Detect skew angle only (no rotation done)

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    As requested on blog:

    Can you explain how to simply find the angle but not rotate using this tool? Since I’m dealing with archival TIFFs I need to keep the DPI and embedded metadata in place, so I’m thinking I would use ImageMagick to rotate once I have the angle. Thanks.

    Answer:

    For now you could use -l parameter: -l angle: Skip deskewing step if skew angle is smaller And use some large threshold so rotation will always be skipped.

    $deskew -l 80 Sken003.png
    ...
    Preparing input image (Sken003.png) ...
    Calculating skew angle...
    Skew angle found: 0.23
    Skipping deskewing step, skew angle lower than threshold of 80.00
    Done!
    

    For next version I plan to modify this: angle is optional and if omitted rotation is always skipped.

    major DeskewCmdLine proposal 
    opened by galfar 7
  • Please clarify licensing situation

    Please clarify licensing situation

    README says that the license is MIT, but the source files seem to all claim MPL/LGPL. What is actual license of this code? Can you please distribute a license file?

    This came up at AUR: https://aur.archlinux.org/packages/deskew-git/

    opened by ctrlcctrlv 5
  • Parameters on DeskewGui v0.90

    Parameters on DeskewGui v0.90

    Hello,

    After doing some minor testing with the default and the lanczos filters, I agree that the quality under lanczos is noticeably better, and in my old computer, it doesn't take much longer to process (I would say that a couple of seconds per page).

    In conclusion, I'd like to use lanczos from now on, but DeskewGui doesn't allow me, to the best of my knowledge, to include parameters when I call deskew.exe.

    Is there way I can tell the programme that I want to use lanczos?

    Thanks!

    DeskewGui 
    opened by vivadavid 5
  • EImagingError: Error while loading images from file.... Exception message: Access violation

    EImagingError: Error while loading images from file.... Exception message: Access violation

    Original report by Anonymous.


    When trying to deskew landscape images on Windows, the error message:

    EImagingError: Error while loading images from file "name of file" <format: tif> Exception message: Access violation

    appears.

    bug major 
    opened by galfar 4
  • Better image quality after rotation

    Better image quality after rotation

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    Received several "complaints" about images after rotation step to be a bit blurry (compared to doing rotation in ImageMagick etc.).

    Current "speed over quality" rotation algorithm used in Deskew must be replaced/supplemented with another one.

    enhancement DeskewCmdLine critical 
    opened by galfar 3
  • Simple GUI frontend

    Simple GUI frontend

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    I get many requests for "process all files in folder" etc. from people not comfortable with command line, shell scripts etc.

    Simple GUI frontend (to cmd. line Deskew) with batch processing capability would be nice for many people.

    Binaries for Windows, macOS, and Linux required.

    major proposal DeskewGui 
    opened by galfar 3
  • GUI: Option to select sampling filter

    GUI: Option to select sampling filter

    The GUI does not currently have an option for this, and I don't know any other way of doing batch image processing with the CLI tool, the default linear filter option blurs the images too much, so it would be nice to be able to change it in the GUI.

    opened by NebulaOnion 2
  • Support for multipage pdf files

    Support for multipage pdf files

    Hi, thanks for this program. I think it would be useful to support to deskew a whole pdf file with multiple pages. I normally scan books or documents directly into a multipage pdf file, since it is more manageable to only have one file and not one per page. Do you think that this might be within the scope of this program?

    Cheers

    opened by cristobaltapia 2
  • DeskewGui: default window is too big for some common screens

    DeskewGui: default window is too big for some common screens

    Original report by Anonymous.


    The default window for DeskewGui is a little too big. In some screens, in particular laptop screens with 1366x800 pixels, it is somewhat difficult to redimension the window because the window title bar is left off screen. The maximum height should be no bigger than about 600 pixels. Thank you for your nice work.

    bug minor DeskewGui 
    opened by galfar 2
  • Compiling the GUI on Linux?

    Compiling the GUI on Linux?

    Hello @galfar, I'm your AUR maintainer

    I noticed you have a GUI now, but in Scripts there only seems to be a script to compile it on Mac.

    I get:

    deskewgui.lpr(10,3) Fatal: Can't find unit Interfaces used by deskewgui
    

    Is this usable on Linux?

    opened by ctrlcctrlv 1
  • GUI: allow passing extra parameters to CLI

    GUI: allow passing extra parameters to CLI

    As GUI will always lag behind a bit and won't provide all the options CLI can handle it may be useful to allow passing extra parameters directly from GUI to CLI.

    A new text edit in "Advanced options" should take care of it.

    enhancement DeskewGui 
    opened by galfar 1
  • feature request: release for linux arm64

    feature request: release for linux arm64

    It will be awesome to have more releases a) linux ARM64 and b) macOS-arm64. I would like to use this CLI in linux docker running over Apple M1 computer.

    opened by amitm02 2
  • Auto detect content rectangle

    Auto detect content rectangle

    Content rectangle can be auto detected by scanning the image (after thresholding) from the sides and looking where non-white pixels start. If it's fast enough it could be a default setting. If custom content rectangle is passed as a parameter by the user let's not use the detection.

    enhancement DeskewCmdLine 
    opened by galfar 0
  • Options for Specifying Skew Angle

    Options for Specifying Skew Angle

    I am looking for a cmd option that can be used to specify the angle that is being calculated before but i was not able to find it in the given set of options. I am applying some preprocessing on same image but has different preprocessed version. But when I run deskew on both of these images I get different skew angles. I want to use same angle for both versions of same image.

    opened by hamxahbhatti 2
  • Take the background color from the input image

    Take the background color from the input image

    Request came in to extend "-b" background color parameter to take it's value from the input image (edge/corner). https://galfar.vevb.net/wp/projects/deskew/comment-page-2/#comment-184847

    One request if at all possible is can there be an option for -b that auto samples the rgb value of an edge. I currently run an auto-crop script after they are deskewed and some pages have a different background color. This sometimes trips up the auto crop into thinking the added background is an edge. I’m hoping for something a little more dynamic that allows me to use the tool without sorting the material first.

    Hi, you mean something like “look at pixel [0,0] of input image and use it for output background”?

    Yes. “look at pixel [0,0] of input image and use it for output background” would be a great additional feature.

    enhancement 
    opened by galfar 8
Releases(v1.30)
  • v1.30(Jun 18, 2019)

    Command line tool for deskewing scanned documents. Binaries for several platforms and test images included.

    README

    Recent changes:

    v1.30 2019-06-07:

    • fix #15: Better image quality after rotation - better default and also selectable nearest|linear|cubic|lanczos filtering
    • fix #5: Detect skew angle only (no rotation done) - optionally only skew detection
    • fix #17: Optional auto-crop after rotation
    • fix #3: Command line option to set output compression - now for TIFF and JPEG
    • fix #12: Bad behavior when an output is given and no deskewing is needed
    • libtiff in macOS is now picked up also when binaries are put directly in the directory with deskew
    • text output is flushed after every write (Linux/Unix): it used to be flushed only when writing to device but not file/pipe.

    v1.25 2018-05-19:

    • fix #6: Preserve DPI measurement system (TIFF)
    • fix #4: Output image not saved in requested format (when deskewing is skipped)
    • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed
    Source code(tar.gz)
    Source code(zip)
    Deskew-1.30.zip(4.29 MB)
  • gui-v0.90(Jan 4, 2019)

    GUI Frontend for Deskew Command Line Tool

    Now it’s easier to process many files without writing shell scripts. It needs the command line tool which is called for the each input file. You can set the basic and most of the advanced options for deskewing in the GUI.

    Prebuilt executables for Windows and Linux are available in the download – you just place them to the same folder as the command line tool. Version for macOS is a bit more convenient – it’s a self-contained app bundle with CLI tool already inside and all placed in DMG image. You can also set the explicit path to the command line tool in the program itself.

    Source code(tar.gz)
    Source code(zip)
    DeskewGui-0.90.zip(4.11 MB)
  • v1.25(Jan 4, 2019)

    Command line tool for deskewing scanned documents. Binaries for several platforms and test images included.

    Changes since the last release:

    1.25 2018-05-19:

    • fix #6: Preserve DPI measurement system (TIFF)
    • fix #4: Output image not saved in requested format (when deskewing is skipped)
    • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed

    1.21 2017-11-01:

    • fix #8: Cannot compile in Free Pascal 3.0+ (Windows) - Fails to link precompiled LibTiff library
    • fix #7: Windows FPC build fails with Access violation exception when loading certain TIFFs (especially those saved by Windows Photo Viewer etc.)
    Source code(tar.gz)
    Source code(zip)
    deskew-125.zip(4.31 MB)
Generic framework for historical document processing

dhSegment dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different ty

Digital Humanities Laboratory 343 Dec 24, 2022
Color Picker and Color Detection tool for METR4202

METR4202 Color Detection Help This is sample code that can be used for the METR4202 project demo. There are two files provided, both running on Python

Miguel Valencia 1 Oct 23, 2021
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

86 Oct 05, 2022
This Repository contain Opencv Projects in python

Python-Opencv OpenCV OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was

Yash Sakre 2 Nov 06, 2021
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
一款基于Qt与OpenCV的仿真数字示波器

一款基于Qt与OpenCV的仿真数字示波器

郭赟 4 Nov 02, 2022
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
Generates a message from the infamous Jerma Impostor image

Generate your very own jerma sus imposter message. Modes: Default Mode: Only supports the characters " ", !, a, b, c, d, e, h, i, m, n, o, p, q, r, s,

Giorno420 1 Oct 27, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
FOTS Pytorch Implementation

News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio

Ning Lu 599 Dec 19, 2022
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 03, 2023
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Handwritten Character Recognition using CNN

Handwritten Character Recognition using CNN Problem Definition The main objective of this project is to solve the problem of handwritten character rec

Mohit Kaushik 4 Mar 02, 2022
Document Image Dewarping

Document image dewarping using text-lines and line Segments Abstract Conventional text-line based document dewarping methods have problems when handli

Taeho Kil 268 Dec 23, 2022