2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Overview

TableMASTER-mmocr

Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Result
  5. License
  6. Acknowledgements

About The Project

This project presents our 2nd place solution for ICDAR 2021 Competition on Scientific Literature Parsing, Task B. We reimplement our solution by MMOCR,which is an open-source toolbox based on PyTorch. You can click here for more details about this competition. Our original implementation is based on FastOCR (one of our internal toolbox similar with MMOCR).

Method Description

In our solution, we divide the table content recognition task into four sub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Based on MASTER, we propose a novel table structure recognition architrcture, which we call TableMASTER. The difference between MASTER and TableMASTER will be shown below. You can click here for more details about this solution.

MASTER's architecture

Dependency

Getting Started

Prerequisites

  • Competition dataset PubTabNet, click here for downloading.
  • About PubTabNet, check their github and paper.
  • About the metric TEDS, see github

Installation

  1. Install mmdetection. click here for details.

    # We embed mmdetection-2.11.0 source code into this project.
    # You can cd and install it (recommend).
    cd ./mmdetection-2.11.0
    pip install -v -e .
  2. Install mmocr. click here for details.

    # install mmocr
    cd ./MASTER_mmocr
    pip install -v -e .
  3. Install mmcv-full-1.3.4. click here for details.

    pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
    
    # install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
    pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

Data preprocess

Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.

python ./table_recognition/data_preprocess.py

It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.

Directory structure of parsed train data is :

.
├── StructureLabelAddEmptyBbox_train
│   ├── PMC1064074_007_00.txt
│   ├── PMC1064076_003_00.txt
│   ├── PMC1064076_004_00.txt
│   └── ...
├── recognition_train_img
│   ├── 0
│       ├── PMC1064100_007_00_0.png
│       ├── PMC1064100_007_00_10.png
│       ├── ...
│       └── PMC1064100_007_00_108.png
│   ├── 1
│   ├── ...
│   └── 15
├── recognition_train_txt
│   ├── 0.txt
│   ├── 1.txt
│   ├── ...
│   └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt

Train

  1. Train text line detection model with PSENet.

    sh ./table_recognition/table_text_line_detection_dist_train.sh

    We don't offer PSENet train data here, you can create the text line annotations by open source label software. In our experiment, we only use 2,500 table images to train our model. It gets a perfect text line detection result on validation set.

  2. Train text-line recognition model with MASTER.

    sh ./table_recognition/table_text_line_recognition_dist_train.sh

    We can get about 30,000,000 text line images from 500,777 training images and 550,000 text line images from 9115 validation images. But we only select 20,000 text line images from 550,000 dataset for evaluatiing after each trainig epoch, to pick up the best text line recognition model.

    Note that our MASTER OCR is directly trained on samples mixed with single-line texts and multiple-line texts.

  3. Train table structure recognition model, with TableMASTER.

    sh ./table_recognition/table_recognition_dist_train.sh

Inference

To get final results, firstly, we need to forward the three up-mentioned models, respectively. Secondly, we merge the results by our matching algorithm, to generate the final HTML code.

  1. Models inference. We do this to speed up the inference.
python ./table_recognition/run_table_inference.py

run_table_inference.py wil call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .

Directory structure of text line detection and text line recognition inference results are:

# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files. 
.
├── end2end_caches
│   ├── end2end_results.pkl
│   ├── detection_results_0.pkl
│   ├── detection_results_1.pkl
│   ├── ...
│   └── detection_results_7.pkl
├── structure_master_caches
│   ├── structure_master_results_0.pkl
│   ├── structure_master_results_1.pkl
│   ├── ...
│   └── structure_master_results_7.pkl
  1. Merge results.
python ./table_recognition/match.py

After matching, congratulations, you will get final result pickle file.

Get TEDS score

  1. Installation.

    pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt
  2. Get gtVal.json.

    python ./table_recognition/get_val_gt.py
  3. Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py

    python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py

Result

Text line end2end recognition accuracy

Models Accuracy
PSENet + MASTER 0.9885

Structure recognition accuracy

Model architecture Accuracy
TableMASTER_maxlength_500 0.7808
TableMASTER_ConcatLayer_maxlength_500 0.7821
TableMASTER_ConcatLayer_maxlength_600 0.7799

TEDS score

Models TEDS
PSENet + MASTER + TableMASTER_maxlength_500 0.9658
PSENet + MASTER + TableMASTER_ConcatLayer_maxlength_500 0.9669
PSENet + MASTER + ensemble_TableMASTER 0.9676

In this paper, we reported 0.9684 TEDS score in validation set (9115 samples). The gap between 0.9676 and 0.9684 comes from that we ensemble three text line models in the competition, but here, we only use one model. Of course, hyperparameter tuning will also affect TEDS score.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{ye2021pingan,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
  author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
  journal={arXiv preprint arXiv:2105.01848},
  year={2021}
}
@article{He2021PingAnVCGroupsSF,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex},
  author={Yelin He and Xianbiao Qi and Jiaquan Ye and Peng Gao and Yihao Chen and Bingcong Li and Xin Tang and Rong Xiao},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.01846}
}
@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}
@article{li2018shape,
  title={Shape robust text detection with progressive scale expansion network},
  author={Li, Xiang and Wang, Wenhai and Hou, Wenbo and Liu, Ruo-Ze and Lu, Tong and Yang, Jian},
  journal={arXiv preprint arXiv:1806.02559},
  year={2018}
}

Acknowledgements

Owner
Jianquan Ye
Jianquan Ye
Modification of convolutional neural net "UNET" for image segmentation in Keras framework

ZF_UNET_224 Pretrained Model Modification of convolutional neural net "UNET" for image segmentation in Keras framework Requirements Python 3.*, Keras

209 Nov 02, 2022
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

Chetan Hirapara 3 Oct 07, 2022
Assginment for UofT CSC420: Intro to Image Understanding

Run the code Open edge_detection.ipynb in google colab. Upload image1.jpg,image2.jpg and my_image.jpg to '/content/drive/My Drive'. chooose 'Run all'

Ziyi-Zhou 1 Feb 24, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Conversion between units used in magnetism

convmag Conversion between various units used in magnetism The conversions between base units available are: T - G : 1e4

0 Jul 15, 2021
Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

radar-to-lidar-place-recognition This page is the coder of a pre-print, implemented by PyTorch. If you have some questions on this project, please fee

Huan Yin 37 Oct 09, 2022
A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

meddlr Getting Started Meddlr is a config-driven ML framework built to simplify medical image reconstruction and analysis problems. Installation To av

Arjun Desai 36 Dec 16, 2022
AnimationKit: AI Upscaling & Interpolation using Real-ESRGAN+RIFE

ALPHA 2.5: Frostbite Revival (Released 12/23/21) Changelog: [ UI ] Chained design. All steps link to one another! Use the master override toggles to s

87 Nov 16, 2022
Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

[Paper] [Project page] This repository contains code for the paper: Andrew Owens, Alexei A. Efros. Audio-Visual Scene Analysis with Self-Supervised Mu

Andrew Owens 202 Dec 13, 2022
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Kai Zhang 1.2k Dec 29, 2022
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Dec 30, 2022
Detecting Blurred Ground-based Sky/Cloud Images

Detecting Blurred Ground-based Sky/Cloud Images With the spirit of reproducible research, this repository contains all the codes required to produce t

1 Oct 20, 2021
Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This repo contains the official implementation of the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.

Ehab AlBadawy 93 Jan 05, 2023
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022
pytorch implementation for PointNet

PointNet.pytorch This repo is implementation for PointNet in pytorch. The model is in pointnet/model.py. It is teste

Fei Xia 1.7k Dec 30, 2022
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Deformable Neural Radiance Fields This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies. Project Page Paper Video This codebase conta

Google 1k Jan 09, 2023
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
PyTorch implementation of PSPNet

PSPNet with PyTorch Unofficial implementation of "Pyramid Scene Parsing Network" (https://arxiv.org/abs/1612.01105). This repository is just for caffe

Kazuto Nakashima 52 Nov 16, 2022
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022