Page to PAGE Layout Analysis Tool

Overview

P2PaLA

Python Version Code Style

Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.

๐Ÿ’ฅ Try our new DEMO for online baseline detection. โ— โ—

If you find this toolkit useful in your research, please cite:

@misc{p2pala2017,
  author = {Lorenzo Quirรณs},
  title = {P2PaLA: Page to PAGE Layout Analysis tookit},
  year = {2017},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}

Check this paper for more details Arxiv.

Requirements

  • Linux (OSX may work, but untested.).
  • Python (2.7, 3.6 under conda virtual environment is recomended)
  • Numpy
  • PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
  • OpenCv (3.4.5.20).
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
  • tensorboard-pytorch (v0.9) [Optional]. pip install tensorboardX > A diferent conda env is recomended to keep tensorflow separated from PyTorch

Install

python setup.py install

To install python dependencies alone, use requirements file conda env create --file conda_requirements.yml

Usage

  1. Input data must follow the folder structure data_tag/page, where images must be into the data_tag folder and xml files into page. For example:
mkdir -p data/{train,val,test,prod}/page;
tree data;
data
โ”œโ”€โ”€ prod
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ prod_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ prod_1.xml
โ”‚   โ”œโ”€โ”€ prod_0.jpg
โ”‚   โ””โ”€โ”€ prod_1.jpg
โ”œโ”€โ”€ test
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ test_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ test_1.xml
โ”‚   โ”œโ”€โ”€ test_0.jpg
โ”‚   โ””โ”€โ”€ test_1.jpg
โ”œโ”€โ”€ train
โ”‚   โ”œโ”€โ”€ page
โ”‚   โ”‚   โ”œโ”€โ”€ train_0.xml
โ”‚   โ”‚   โ””โ”€โ”€ train_1.xml
โ”‚   โ”œโ”€โ”€ train_0.jpg
โ”‚   โ””โ”€โ”€ train_1.jpg
โ””โ”€โ”€ val
    โ”œโ”€โ”€ page
    โ”‚   โ”œโ”€โ”€ val_0.xml
    โ”‚   โ””โ”€โ”€ val_1.xml
    โ”œโ”€โ”€ val_0.jpg
    โ””โ”€โ”€ val_1.jpg
  1. Run the tool.
python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"

โ— Pre-trained models available here

  1. Use TensorBoard to visualize train status:
tensorboard --logdir ./work/runs
  1. xml-PAGE files must be at "./work/results/test/"

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

  1. For detail about arguments and config file, see docs or python P2PaLa.py -h.
  2. For more detailed example see egs:
    • Bozen dataset see
    • cBAD complex competition dataset see
    • OHG dataset see

License

GNU General Public License v3.0 See LICENSE to see the full text.

Acknowledgments

Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix

Owner
Lorenzo Quirรณs Dรญaz
Lorenzo Quirรณs Dรญaz
Face Recognizer using Opencv Python

Face Recognizer using Opencv Python The first step create your own dataset with file open-cv-create_dataset second step You can put the photo accordin

Han Izza 2 Nov 16, 2021
TedEval: A Fair Evaluation Metric for Scene Text Detectors

TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,

Clova AI Research 167 Nov 20, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Program created with opencv that allows you to automatically count your repetitions on several fitness exercises.

Virtual partner of gym Description Program created with opencv that allows you to automatically count your repetitions on several fitness exercises li

1 Jan 04, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
BD-ALL-DIGIT - This Is Bangladeshi All Sim Cloner Tools

BANGLADESHI ALL SIM CLONER TOOLS INSTALL TOOL ON TERMUX $ apt update $ apt upgra

MAHADI HASAN AFRIDI 2 Jan 19, 2022
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

An Agnostic Object Detection Framework IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-q

airctic 790 Jan 05, 2023
โ›“ marc is a small, but flexible Markov chain generator

About marc (markov chain) is a small, but flexible Markov chain generator. Usage marc is easy to use. To build a MarkovChain pass the object a sequenc

Max Humber 65 Oct 27, 2022
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022
ๅŸบไบŽopenposeๅ’Œๅ›พๅƒๅˆ†็ฑป็š„ๆ‰‹่ฏญ่ฏ†ๅˆซ้กน็›ฎ

ๆ‰‹่ฏญ่ฏ†ๅˆซ 0ใ€ไฝฟ็”จๅˆฐ็š„ๆจกๅž‹ (1). openpose๏ผŒไฝœ่€…๏ผšCMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). ๅ›พๅƒๅˆ†็ฑปclassification๏ผŒไฝœ่€…๏ผšBubbl

20 Dec 15, 2022
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
An application of high resolution GANs to dewarp images of perturbed documents

Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat

Thomas Huang 97 Dec 25, 2022