Detectron2 for Document Layout Analysis

Overview


Detectron2 trained on PubLayNet dataset

This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.

NOTE

  • Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
  • Trained on total 191,832 images
  • Models are evaluated on dev.zip (~11,000 images)
  • Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
  • Trained using Nvidia GTX 1080Ti 11GB
  • Trained on Windows 10

Steps to test pretrained models locally or jump to next section for docker deployment

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']
  • Then run below command for prediction on single image (change the config file relevant to the model)
python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

Docker Deployment

  • For local docker deployment for testing use Docker DLA

Benchmarking

Architecture No. images AP AP50 AP75 AP Small AP Medium AP Large Model size full Model size trimmed
MaskRCNN Resnext101_32x8d FPN 3X 191,832 90.574 97.704 95.555 39.904 76.350 95.165 816M 410M
MaskRCNN Resnet101 FPN 3X 191,832 90.335 96.900 94.609 36.588 73.672 94.533 480M 240M
MaskRCNN Resnet50 FPN 3X 191,832 87.219 96.949 94.385 38.164 72.292 94.081 168M

Configuration used for training

Architecture Config file Training Script
MaskRCNN Resnext101_32x8d FPN 3X configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml ./tools/train_net_dla.py
MaskRCNN Resnet101 FPN 3X configs/DLA_mask_rcnn_R_101_FPN_3x.yaml ./tools/train_net_dla.py
MaskRCNN Resnet50 FPN 3X configs/DLA_mask_rcnn_R_50_FPN_3x.yaml ./tools/train_net_dla.py

Some helper code and cli commands

Add the below code in demo/demo.py to get confidence along with label names

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']

Then run below command for prediction on single image

python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

TODOs

  • Train MaskRCNN resnet50

Sample results from detectron2


Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

  • It is powered by the PyTorch deep learning framework.
  • Includes more features such as panoptic segmentation, densepose, Cascade R-CNN, rotated bounding boxes, etc.
  • Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
  • It trains much faster.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Quick Start

See GETTING_STARTED.md, or the Colab Notebook.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}
Owner
Himanshu
:zap: Machine Learning Engineer
Himanshu
PyTorch Implementation of SSTNs for hyperspectral image classifications from the IEEE T-GRS paper "Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework."

PyTorch Implementation of SSTN for Hyperspectral Image Classification Paper links: SSTN published on IEEE T-GRS. Also, you can directly find the imple

Zilong Zhong 54 Dec 19, 2022
ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

65 Nov 28, 2022
Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

TensorFlow Serving + Streamlit! ✨ 🖼️ Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them! This is a pretty simple S

Álvaro Bartolomé 18 Jan 07, 2023
The implementation of the lifelong infinite mixture model

Lifelong infinite mixture model 📋 This is the implementation of the Lifelong infinite mixture model 📋 Accepted by ICCV 2021 Title : Lifelong Infinit

Fei Ye 5 Oct 20, 2022
Syed Waqas Zamir 906 Dec 30, 2022
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
This tool uses Deep Learning to help you draw and write with your hand and webcam.

This tool uses Deep Learning to help you draw and write with your hand and webcam. A Deep Learning model is used to try to predict whether you want to have 'pencil up' or 'pencil down'.

lmagne 169 Dec 10, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
A Fast Knowledge Distillation Framework for Visual Recognition

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
CN24 is a complete semantic segmentation framework using fully convolutional networks

Build status: master (production branch): develop (development branch): Welcome to the CN24 GitHub repository! CN24 is a complete semantic segmentatio

Computer Vision Group Jena 123 Jul 14, 2022
Accelerate Neural Net Training by Progressively Freezing Layers

FreezeOut A simple technique to accelerate neural net training by progressively freezing layers. This repository contains code for the extended abstra

Andy Brock 203 Jun 19, 2022
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022
NHL 94 AI contests

nhl94-ai The end goals of this project is to: Train Models that play NHL 94 Support AI vs AI contests in NHL 94 Provide an improved AI opponent for NH

Mathieu Poliquin 2 Dec 06, 2021
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets This is the official PyTorch implementation for the paper Rapid Neural A

48 Dec 26, 2022
WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution

WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution This code belongs to the paper [1] available at https://arx

Fabian Altekrueger 5 Jun 02, 2022
Official repository of Semantic Image Matting

Semantic Image Matting This is the official repository of Semantic Image Matting (CVPR2021). Overview Natural image matting separates the foreground f

192 Dec 29, 2022
Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

Fuhang 5 Jan 18, 2022
The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

This is the project page for the paper: Architecture Disentanglement for Deep Neural Networks, Jie Hu, Liujuan Cao, Tong Tong, Ye Qixiang, ShengChuan

Jie Hu 15 Aug 30, 2022