Detectron2 for Document Layout Analysis

Overview


Detectron2 trained on PubLayNet dataset

This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.

NOTE

  • Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
  • Trained on total 191,832 images
  • Models are evaluated on dev.zip (~11,000 images)
  • Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
  • Trained using Nvidia GTX 1080Ti 11GB
  • Trained on Windows 10

Steps to test pretrained models locally or jump to next section for docker deployment

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']
  • Then run below command for prediction on single image (change the config file relevant to the model)
python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

Docker Deployment

  • For local docker deployment for testing use Docker DLA

Benchmarking

Architecture No. images AP AP50 AP75 AP Small AP Medium AP Large Model size full Model size trimmed
MaskRCNN Resnext101_32x8d FPN 3X 191,832 90.574 97.704 95.555 39.904 76.350 95.165 816M 410M
MaskRCNN Resnet101 FPN 3X 191,832 90.335 96.900 94.609 36.588 73.672 94.533 480M 240M
MaskRCNN Resnet50 FPN 3X 191,832 87.219 96.949 94.385 38.164 72.292 94.081 168M

Configuration used for training

Architecture Config file Training Script
MaskRCNN Resnext101_32x8d FPN 3X configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml ./tools/train_net_dla.py
MaskRCNN Resnet101 FPN 3X configs/DLA_mask_rcnn_R_101_FPN_3x.yaml ./tools/train_net_dla.py
MaskRCNN Resnet50 FPN 3X configs/DLA_mask_rcnn_R_50_FPN_3x.yaml ./tools/train_net_dla.py

Some helper code and cli commands

Add the below code in demo/demo.py to get confidence along with label names

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']

Then run below command for prediction on single image

python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

TODOs

  • Train MaskRCNN resnet50

Sample results from detectron2


Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

  • It is powered by the PyTorch deep learning framework.
  • Includes more features such as panoptic segmentation, densepose, Cascade R-CNN, rotated bounding boxes, etc.
  • Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
  • It trains much faster.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Quick Start

See GETTING_STARTED.md, or the Colab Notebook.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}
Owner
Himanshu
:zap: Machine Learning Engineer
Himanshu
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Status: Under development (expect bug fixes and huge updates) ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectiv

37 Dec 28, 2022
Simply enable or disable your Nvidia dGPU

EnvyControl (WIP) Simply enable or disable your Nvidia dGPU Usage First clone this repo and install envycontrol with sudo pip install . CLI Turn off y

Victor Bayas 292 Jan 03, 2023
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 02, 2023
Computing Shapley values using VAEAC

Shapley values and the VAEAC method In this GitHub repository, we present the implementation of the VAEAC approach from our paper "Using Shapley Value

3 Nov 23, 2022
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Retrieval.pytorch - The code we used in [2020 DIGIX]

Retrieval.pytorch - The code we used in [2020 DIGIX]

Guo-Hua Wang 2 Feb 07, 2022
PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration (NeurIPS 2021) PyTorch implementation of the paper: CoFiNet: Reli

76 Jan 03, 2023
Datasets and pretrained Models for StyleGAN3 ...

Datasets and pretrained Models for StyleGAN3 ... Dear arfiticial friend, this is a collection of artistic datasets and models that we have put togethe

lucid layers 34 Oct 06, 2022
Meta-learning for NLP

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks Code for training the meta-learning models and fine-tuning on downstr

IESL 43 Nov 08, 2022
这是一个deeplabv3-plus-pytorch的源码,可以用于训练自己的模型。

DeepLabv3+:Encoder-Decoder with Atrous Separable Convolution语义分割模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download 训练步骤

Bubbliiiing 350 Dec 28, 2022
Neural Turing Machines (NTM) - PyTorch Implementation

PyTorch Neural Turing Machine (NTM) PyTorch implementation of Neural Turing Machines (NTM). An NTM is a memory augumented neural network (attached to

Guy Zana 519 Dec 21, 2022
moving object detection for satellite videos.

DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos Algorithm Introduction DSFNet: Dynamic and Static Fusion Net

xiaochao 39 Dec 16, 2022
An implementation of Deep Graph Infomax (DGI) in PyTorch

DGI Deep Graph Infomax (Veličković et al., ICLR 2019): https://arxiv.org/abs/1809.10341 Overview Here we provide an implementation of Deep Graph Infom

Petar Veličković 491 Jan 03, 2023
Unoffical reMarkable AddOn for Firefox.

reMarkable for Firefox (Download) This repo converts the offical reMarkable Chrome Extension into a Firefox AddOn published here under the name "Unoff

Jelle Schutter 45 Nov 28, 2022
Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021

shRIOL The subfolder shRIOL contains Java files to execute the SHACL files on the OWL ontology. To compile the Java files: "javac -cp ./src/;./lib/* -

1 Dec 06, 2022
ICON: Implicit Clothed humans Obtained from Normals

ICON: Implicit Clothed humans Obtained from Normals arXiv, December 2021. Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black Table of C

Yuliang Xiu 1.1k Dec 30, 2022
KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

KuaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%) KuaiRec is a real-world dataset collected from the recommendation log

Chongming GAO (高崇铭) 70 Dec 28, 2022
Adaptive, interpretable wavelets across domains (NeurIPS 2021)

Adaptive wavelets Wavelets which adapt given data (and optionally a pre-trained model). This yields models which are faster, more compressible, and mo

Yu Group 50 Dec 16, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
PyTorch Connectomics: segmentation toolbox for EM connectomics

Introduction The field of connectomics aims to reconstruct the wiring diagram of the brain by mapping the neural connections at the level of individua

Zudi Lin 132 Dec 26, 2022