2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Overview

TableMASTER-mmocr

Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Result
  5. License
  6. Acknowledgements

About The Project

This project presents our 2nd place solution for ICDAR 2021 Competition on Scientific Literature Parsing, Task B. We reimplement our solution by MMOCR,which is an open-source toolbox based on PyTorch. You can click here for more details about this competition. Our original implementation is based on FastOCR (one of our internal toolbox similar with MMOCR).

Method Description

In our solution, we divide the table content recognition task into four sub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Based on MASTER, we propose a novel table structure recognition architrcture, which we call TableMASTER. The difference between MASTER and TableMASTER will be shown below. You can click here for more details about this solution.

MASTER's architecture

Dependency

Getting Started

Prerequisites

  • Competition dataset PubTabNet, click here for downloading.
  • About PubTabNet, check their github and paper.
  • About the metric TEDS, see github

Installation

  1. Install mmdetection. click here for details.

    # We embed mmdetection-2.11.0 source code into this project.
    # You can cd and install it (recommend).
    cd ./mmdetection-2.11.0
    pip install -v -e .
  2. Install mmocr. click here for details.

    # install mmocr
    cd ./MASTER_mmocr
    pip install -v -e .
  3. Install mmcv-full-1.3.4. click here for details.

    pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
    
    # install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
    pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

Data preprocess

Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.

python ./table_recognition/data_preprocess.py

It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.

Directory structure of parsed train data is :

.
├── StructureLabelAddEmptyBbox_train
│   ├── PMC1064074_007_00.txt
│   ├── PMC1064076_003_00.txt
│   ├── PMC1064076_004_00.txt
│   └── ...
├── recognition_train_img
│   ├── 0
│       ├── PMC1064100_007_00_0.png
│       ├── PMC1064100_007_00_10.png
│       ├── ...
│       └── PMC1064100_007_00_108.png
│   ├── 1
│   ├── ...
│   └── 15
├── recognition_train_txt
│   ├── 0.txt
│   ├── 1.txt
│   ├── ...
│   └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt

Train

  1. Train text line detection model with PSENet.

    sh ./table_recognition/table_text_line_detection_dist_train.sh

    We don't offer PSENet train data here, you can create the text line annotations by open source label software. In our experiment, we only use 2,500 table images to train our model. It gets a perfect text line detection result on validation set.

  2. Train text-line recognition model with MASTER.

    sh ./table_recognition/table_text_line_recognition_dist_train.sh

    We can get about 30,000,000 text line images from 500,777 training images and 550,000 text line images from 9115 validation images. But we only select 20,000 text line images from 550,000 dataset for evaluatiing after each trainig epoch, to pick up the best text line recognition model.

    Note that our MASTER OCR is directly trained on samples mixed with single-line texts and multiple-line texts.

  3. Train table structure recognition model, with TableMASTER.

    sh ./table_recognition/table_recognition_dist_train.sh

Inference

To get final results, firstly, we need to forward the three up-mentioned models, respectively. Secondly, we merge the results by our matching algorithm, to generate the final HTML code.

  1. Models inference. We do this to speed up the inference.
python ./table_recognition/run_table_inference.py

run_table_inference.py wil call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .

Directory structure of text line detection and text line recognition inference results are:

# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files. 
.
├── end2end_caches
│   ├── end2end_results.pkl
│   ├── detection_results_0.pkl
│   ├── detection_results_1.pkl
│   ├── ...
│   └── detection_results_7.pkl
├── structure_master_caches
│   ├── structure_master_results_0.pkl
│   ├── structure_master_results_1.pkl
│   ├── ...
│   └── structure_master_results_7.pkl
  1. Merge results.
python ./table_recognition/match.py

After matching, congratulations, you will get final result pickle file.

Get TEDS score

  1. Installation.

    pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt
  2. Get gtVal.json.

    python ./table_recognition/get_val_gt.py
  3. Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py

    python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py

Result

Text line end2end recognition accuracy

Models Accuracy
PSENet + MASTER 0.9885

Structure recognition accuracy

Model architecture Accuracy
TableMASTER_maxlength_500 0.7808
TableMASTER_ConcatLayer_maxlength_500 0.7821
TableMASTER_ConcatLayer_maxlength_600 0.7799

TEDS score

Models TEDS
PSENet + MASTER + TableMASTER_maxlength_500 0.9658
PSENet + MASTER + TableMASTER_ConcatLayer_maxlength_500 0.9669
PSENet + MASTER + ensemble_TableMASTER 0.9676

In this paper, we reported 0.9684 TEDS score in validation set (9115 samples). The gap between 0.9676 and 0.9684 comes from that we ensemble three text line models in the competition, but here, we only use one model. Of course, hyperparameter tuning will also affect TEDS score.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{ye2021pingan,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
  author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
  journal={arXiv preprint arXiv:2105.01848},
  year={2021}
}
@article{He2021PingAnVCGroupsSF,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex},
  author={Yelin He and Xianbiao Qi and Jiaquan Ye and Peng Gao and Yihao Chen and Bingcong Li and Xin Tang and Rong Xiao},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.01846}
}
@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}
@article{li2018shape,
  title={Shape robust text detection with progressive scale expansion network},
  author={Li, Xiang and Wang, Wenhai and Hou, Wenbo and Liu, Ruo-Ze and Lu, Tong and Yang, Jian},
  journal={arXiv preprint arXiv:1806.02559},
  year={2018}
}

Acknowledgements

Owner
Jianquan Ye
Jianquan Ye
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
A set of examples around hub for creating and processing datasets

Examples for Hub - Dataset Format for AI A repository showcasing examples of using Hub Uploading Dataset Places365 Colab Tutorials Notebook Link Getti

Activeloop 11 Dec 14, 2022
Predictive Maintenance LSTM

Predictive-Maintenance-LSTM - Predictive maintenance study for Complex case study, we've obtained failure causes by operational error and more deeply by design mistakes.

Amir M. Sadafi 1 Dec 31, 2021
Official implementation of paper Gradient Matching for Domain Generalization

Gradient Matching for Domain Generalisation This is the official PyTorch implementation of Gradient Matching for Domain Generalisation. In our paper,

94 Dec 23, 2022
A Flexible Generative Framework for Graph-based Semi-supervised Learning (NeurIPS 2019)

G3NN This repo provides a pytorch implementation for the 4 instantiations of the flexible generative framework as described in the following paper: A

Jiaqi Ma 14 Oct 11, 2022
Using VapourSynth with super resolution models and speeding them up with TensorRT.

VSGAN-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Using NVIDIA/Torch-TensorRT combined wi

111 Jan 05, 2023
A distributed deep learning framework that supports flexible parallelization strategies.

FlexFlow FlexFlow is a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization stra

528 Dec 25, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
This repository contains implementations and illustrative code to accompany DeepMind publications

DeepMind Research This repository contains implementations and illustrative code to accompany DeepMind publications. Along with publishing papers to a

DeepMind 11.3k Dec 31, 2022
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022
This is the dataset for testing the robustness of various VO/VIO methods

KAIST VIO dataset This is the dataset for testing the robustness of various VO/VIO methods You can download the whole dataset on KAIST VIO dataset Ind

1 Sep 01, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
Microscopy Image Cytometry Toolkit

Cytokit Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a

Hammer Lab 106 Jan 06, 2023
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

Tensorflow 2 Object Detection API Bu tutorial, TensorFlow 2.x'in kararlı sürümü olan TensorFlow 2.3'ye yöneliktir. Bu, görüntülerde / videoda nesne a

46 Nov 20, 2022
Python scripts form performing stereo depth estimation using the CoEx model in ONNX.

ONNX-CoEx-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the CoEx model in ONNX. Stereo depth estimation on the

Ibai Gorordo 8 Dec 29, 2022
A Tensorflow implementation of BicycleGAN.

BicycleGAN implementation in Tensorflow As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometim

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 97 Dec 02, 2022
The MLOps platform for innovators 🚀

​ DS2.ai is an integrated AI operation solution that supports all stages from custom AI development to deployment. It is an AI-specialized platform service that collects data, builds a training datas

9 Jan 03, 2023
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
Reinforcement Learning for finance

Reinforcement Learning for Finance We apply reinforcement learning for stock trading. Fetch Data Example import utils # fetch symbols from yahoo fina

Tomoaki Fujii 159 Jan 03, 2023