Page to PAGE Layout Analysis Tool

Overview

P2PaLA

Python Version Code Style

Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.

๐Ÿ’ฅ Try our new DEMO for online baseline detection. โ— โ—

If you find this toolkit useful in your research, please cite:

@misc{p2pala2017,
  author = {Lorenzo Quirรณs},
  title = {P2PaLA: Page to PAGE Layout Analysis tookit},
  year = {2017},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}

Check this paper for more details Arxiv.

Requirements

  • Linux (OSX may work, but untested.).
  • Python (2.7, 3.6 under conda virtual environment is recomended)
  • Numpy
  • PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
  • OpenCv (3.4.5.20).
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
  • tensorboard-pytorch (v0.9) [Optional]. pip install tensorboardX > A diferent conda env is recomended to keep tensorflow separated from PyTorch

Install

python setup.py install

To install python dependencies alone, use requirements file conda env create --file conda_requirements.yml

Usage

  1. Input data must follow the folder structure data_tag/page, where images must be into the data_tag folder and xml files into page. For example:
mkdir -p data/{train,val,test,prod}/page;
tree data;
data
โ”œโ”€โ”€ prod
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ prod_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ prod_1.xml
โ”‚   โ”œโ”€โ”€ prod_0.jpg
โ”‚   โ””โ”€โ”€ prod_1.jpg
โ”œโ”€โ”€ test
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ test_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ test_1.xml
โ”‚   โ”œโ”€โ”€ test_0.jpg
โ”‚   โ””โ”€โ”€ test_1.jpg
โ”œโ”€โ”€ train
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ train_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ train_1.xml
โ”‚   โ”œโ”€โ”€ train_0.jpg
โ”‚   โ””โ”€โ”€ train_1.jpg
โ””โ”€โ”€ val
    โ”œโ”€โ”€ page
    โ”‚   โ”œโ”€โ”€ val_0.xml
    โ”‚   โ””โ”€โ”€ val_1.xml
    โ”œโ”€โ”€ val_0.jpg
    โ””โ”€โ”€ val_1.jpg
  1. Run the tool.
python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"

โ— Pre-trained models available here

  1. Use TensorBoard to visualize train status:
tensorboard --logdir ./work/runs
  1. xml-PAGE files must be at "./work/results/test/"

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

  1. For detail about arguments and config file, see docs or python P2PaLa.py -h.
  2. For more detailed example see egs:
    • Bozen dataset see
    • cBAD complex competition dataset see
    • OHG dataset see

License

GNU General Public License v3.0 See LICENSE to see the full text.

Acknowledgments

Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix

Owner
Lorenzo Quirรณs Dรญaz
Lorenzo Quirรณs Dรญaz
A post-processing tool for scanned sheets of paper.

unpaper Originally written by Jens Gulden โ€” see AUTHORS for more information. Licensed under GNU GPL v2 โ€” see COPYING for more information. Overview u

27 Dec 07, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
Natural language detection

Detect the language of text. Whatโ€™s so cool about franc? franc can support more languages(โ€ ) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
2 telegram-bots: for image recognition and for text generation

๐Ÿ’ป ๐Ÿ“ฑ Telegram_Bots ๐Ÿ”Ž & ๐Ÿ“– 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Pinch-zoom This is a python project based on real-time hand-gesture detection, to zoom in or out, using the distance between the index finger and the

Harshit Bhalla 6 Jul 11, 2022
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

MaybeShewill-CV 1000 Dec 27, 2022
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
An OCR evaluation tool

dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu

QURATOR-SPK 40 Dec 20, 2022
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
BD-ALL-DIGIT - This Is Bangladeshi All Sim Cloner Tools

BANGLADESHI ALL SIM CLONER TOOLS INSTALL TOOL ON TERMUX $ apt update $ apt upgra

MAHADI HASAN AFRIDI 2 Jan 19, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 09, 2022
~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

cosc428-structor I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Convent

Chad Oliver 45 Dec 06, 2022
A list of hyperspectral image super-solution resources collected by Junjun Jiang

A list of hyperspectral image super-resolution resources collected by Junjun Jiang. If you find that important resources are not included, please feel free to contact me.

Junjun Jiang 301 Jan 05, 2023
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
ใ‹ใฎๆœ‰ๅใชใ‚ใฎๆฑๆ–นไบŒๆฌกๅ‰ตไฝœใ‚ฝใƒณใ‚ฐใ€ใ€Œbad apple!ใ€ใฎMVใ‚’Pythonใงใ‚„ใฃใฆใฟใŸใฃใฆ่ฉฑ

bad apple!! ๅ†…ๅฎน ใ“ใฎใƒ—ใƒญใ‚ฐใƒฉใƒ ใฏใ€bad apple!(feat. nomico)ใฎPVใ‚’Pythonใ‚’็”จใ„ใฆๅ†็พใ—ใ‚ˆใ†๏ผใจใ„ใ†ๅ†…ๅฎนใงใ™ใ€‚ ๅฎŸใฏYoutubeไธฆใณใซGithubไธŠใซไผผใŸใ‚ˆใ†ใชใƒ—ใƒญใ‚ฐใƒฉใƒ ใŒใ‚ใฃใŸใ—ใชใ‚“ใชใ‚‰ใใฃใกใฎๆ–นใŒ็ตๆง‹่‰ฏใ‹ใฃใŸใ‚Šใ™ใ‚‹ใ‚“ใงใ™ใŒใ€ไธ€ๅฟœๅ…ฌ้–‹ใ—ใพใ™w ไฝฟใ„ๆ–น ใ“

่ตค็ดซ 8 Jan 05, 2023