ocroseg - This is a deep learning model for page layout analysis / segmentation.

Overview

ocroseg

This is a deep learning model for page layout analysis / segmentation.

There are many different ways in which you can train and run it, but by default, it will simply return the text lines in a page image.

Segmentation

Segmentation is carried out using the ocroseg.Segmenter class. This needs a model that you can download or train yourself.

%%bash
model=lowskew-000000259-011440.pt
test -f $model || wget --quiet -nd https://storage.googleapis.com/tmb-models/$model
%pylab inline
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
Populating the interactive namespace from numpy and matplotlib

The Segmenter object handles page segmentation using a DL model.

import ocroseg
seg = ocroseg.Segmenter("lowskew-000000259-011440.pt")
seg.model
Sequential(
  (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (6): ReLU()
  (7): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (10): ReLU()
  (11): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (12): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
  (13): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (14): ReLU()
  (15): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(32, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (16): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
  (17): Sigmoid()
)

Let's segment a page with this.

image = 1.0 - imread("testdata/W1P0.png")[:2000]
print image.shape
imshow(image)
(2000, 2592)





<matplotlib.image.AxesImage at 0x7f6078b09690>

png

The extract_textlines method returns a list of text line images, bounding boxes, etc.

lines = seg.extract_textlines(image)
imshow(lines[0]['image'])
<matplotlib.image.AxesImage at 0x7f60781c05d0>

png

The segmenter accomplishes this by predicting seeds for each text line. With a bit of mathematical morphology, these seeds are then extended into a text line segmentation.

imshow(seg.lines)
<matplotlib.image.AxesImage at 0x7f60781a5510>

png

Training

The text line segmenter is trained using pairs of page images and line images stored in tar files.

%%bash
tar -ztvf testdata/framedlines.tgz | sed 6q
-rw-rw-r-- tmb/tmb      110404 2017-03-19 16:47 A001BIN.framed.png
-rw-rw-r-- tmb/tmb       10985 2017-03-16 16:15 A001BIN.lines.png
-rw-rw-r-- tmb/tmb       74671 2017-03-19 16:47 A002BIN.framed.png
-rw-rw-r-- tmb/tmb        8528 2017-03-16 16:15 A002BIN.lines.png
-rw-rw-r-- tmb/tmb      147716 2017-03-19 16:47 A003BIN.framed.png
-rw-rw-r-- tmb/tmb       12023 2017-03-16 16:15 A003BIN.lines.png


tar: write error
from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/framedlines.tgz")).next()
subplot(121); imshow(sample["framed.png"])
subplot(122); imshow(sample["lines.png"])
<matplotlib.image.AxesImage at 0x7f60e3d9bc10>

png

There are also some tools for data augmentation.

Generally, you can train these kinds of segmenters on any kind of image data, though they work best on properly binarized, rotation and skew-normalized page images. Note that by conventions, pages are white on black. You need to make sure that the model you load matches the kinds of pages you are trying to segment.

The actual models used are pretty complex and require LSTMs to function well, but for demonstration purposes, let's define and use a tiny layout analysis model. Look in bigmodel.py for a realistic model.

%%writefile tinymodel.py
def make_model():
    r = 3
    model = nn.Sequential(
        nn.Conv2d(1, 8, r, padding=r//2),
        nn.ReLU(),
        nn.MaxPool2d(2, 2),
        nn.Conv2d(8, 1, r, padding=r//2),
        nn.Sigmoid()
    )
    return model
Writing tinymodel.py
%%bash
./ocroseg-train -d testdata/framedlines.tgz --maxtrain 10 -M tinymodel.py --display 0
raw sample:
__key__ 'A001BIN'
__source__ 'testdata/framedlines.tgz'
lines.png float32 (3300, 2592)
png float32 (3300, 2592)

preprocessed sample:
__key__ <type 'list'> ['A002BIN']
__source__ <type 'list'> ['testdata/framedlines.tgz']
input float32 (1, 3300, 2592, 1)
mask float32 (1, 3300, 2592, 1)
output float32 (1, 3300, 2592, 1)

ntrain 0
model:
Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (3): Conv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Sigmoid()
)

0 0 ['A006BIN'] 0.24655306 ['A006BIN'] 0.31490618 0.55315816 lr 0.03
1 1 ['A007BIN'] 0.24404158 ['A007BIN'] 0.30752876 0.54983306 lr 0.03
2 2 ['A004BIN'] 0.24024434 ['A004BIN'] 0.31007746 0.54046077 lr 0.03
3 3 ['A008BIN'] 0.23756175 ['A008BIN'] 0.30573484 0.5392694 lr 0.03
4 4 ['A00LBIN'] 0.22300518 ['A00LBIN'] 0.28594157 0.52989864 lr 0.03
5 5 ['A00MBIN'] 0.22032338 ['A00MBIN'] 0.28086954 0.52204597 lr 0.03
6 6 ['A00DBIN'] 0.22794804 ['A00DBIN'] 0.27466372 0.512208 lr 0.03
7 7 ['A009BIN'] 0.22404794 ['A009BIN'] 0.27621177 0.51116604 lr 0.03
8 8 ['A001BIN'] 0.22008553 ['A001BIN'] 0.27836022 0.5008192 lr 0.03
9 9 ['A00IBIN'] 0.21842314 ['A00IBIN'] 0.26755702 0.4992323 lr 0.03
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
Driver Drowsiness Detection with OpenCV & Dlib

In this project, we have built a driver drowsiness detection system that will detect if the eyes of the driver are close for too long and infer if the driver is sleepy or inactive.

Mansi Mishra 4 Oct 26, 2022
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 30, 2022
A python programusing Tkinter graphics library to randomize questions and answers contained in text files

RaffleOfQuestions Um programa simples em python, utilizando a biblioteca gráfica Tkinter para randomizar perguntas e respostas contidas em arquivos de

Gabriel Ferreira Rodrigues 1 Dec 16, 2021
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
Text to QR-CODE

QR CODE GENERATO USING PYTHON Author : RAFIK BOUDALIA. Installation Use the package manager pip to install foobar. pip install pyqrcode Usage from tki

Rafik Boudalia 2 Oct 13, 2021
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 04, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

An Agnostic Object Detection Framework IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-q

airctic 790 Jan 05, 2023
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Image Thresholding is used for image segmentation. From a grayscale image, thresholding can be used to create binary images. In thresholding we pick a threshold T.

Grace Ugochi Nneji 3 Feb 15, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
Autonomous Driving project for Euro Truck Simulator 2

hope-autonomous-driving Autonomous Driving project for Euro Truck Simulator 2 Video: How is it working ? In this video, the program processes the imag

Umut Görkem Kocabaş 36 Nov 06, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022