1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Overview

About The Project

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection toolbox based on PyTorch. You can click here for more details about this competition.

Method Description

We built our approach on FCOS, A simple and strong anchor-free object detector, with ResNeSt as our backbone, to detect embedded and isolated formulas. We employed ATSS as our sampling strategy instead of random sampling to eliminate the effects of sample imbalance. Moreover, we observed and revealed the influence of different FPN levels on the detection result. Generalized Focal Loss is adopted to our loss. Finally, with a series of useful tricks and model ensembles, our method was ranked 1st in the MFD task.

Random Sampling(left) ATSS(right) Random Sampling(left) ATSS(right)

Getting Start

Prerequisites

  • Linux or macOS (Windows is in experimental support)
  • Python 3.6+
  • PyTorch 1.3+
  • CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
  • GCC 5+
  • MMCV

This project is based on MMDetection-v2.7.0, mmcv-full>=1.1.5, <1.3 is needed. Note: You need to run pip uninstall mmcv first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be ModuleNotFoundError.

Installation

  1. Install PyTorch and torchvision following the official instructions , e.g.,

    pip install pytorch torchvision -c pytorch

    Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the PyTorch website.

    E.g.1 If you have CUDA 10.1 installed under /usr/local/cuda and would like to install PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.

    pip install pytorch cudatoolkit=10.1 torchvision -c pytorch

    E.g. 2 If you have CUDA 9.2 installed under /usr/local/cuda and would like to install PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.

    pip install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch

    If you build PyTorch from source instead of installing the prebuilt pacakge, you can use more CUDA versions such as 9.0.

  2. Install mmcv-full, we recommend you to install the pre-build package as below.

    pip install mmcv-full==latest+torch1.6.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html

    See here for different versions of MMCV compatible to different PyTorch and CUDA versions. Optionally you can choose to compile mmcv from source by the following command

    git clone https://github.com/open-mmlab/mmcv.git
    cd mmcv
    MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full will be installed after this step
    cd ..

    Or directly run

    pip install mmcv-full
  3. Install build requirements and then compile MMDetection.

    pip install -r requirements.txt
    pip install tensorboard
    pip install ensemble-boxes
    pip install -v -e .  # or "python setup.py develop"

Usage

Data Preparation

Firstly, Firstly, you need to put the image files and the GT files into two separate folders as below.

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
├── img
    ├── 0001125-page02.jpg
    ├── 0001125-page05.jpg
    ├── ...
    └── 0304067-page08.jpg

Secondly, run data_preprocess.py to get coco format label. Remember to change 'img_path', 'txt_path', 'dst_path' and 'train_path' to your own path.

python ./tools/data_preprocess.py

The new structure of data folder will become,

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│
├── gt_icdar
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│   
├── img
│   ├── 0001125-page02.jpg
│   ├── 0001125-page05.jpg
│   ├── ...
│   └── 0304067-page08.jpg
│
└── train_coco.json

Finally, change 'data_root' in ./configs/base/datasets/formula_detection.py to your path.

Train

  1. train with single gpu on ResNeSt50

    python tools/train.py configs/gfl/gfl_s50_fpn_2x_coco.py --gpus 1 --work-dir ${Your Dir}
  2. train with 8 gpus on ResNeSt101

    ./tools/dist_train.sh configs/gfl/gfl_s101_fpn_2x_coco.py 8 --work-dir ${Your Dir}

Inference

Run tools/test_formula.py

python tools/test_formula.py configs/gfl/gfl_s101_fpn_2x_coco.py ${checkpoint path} 

It will generate a 'result' file at the same level with work-dir in default. You can specify the output path of the result file in line 231.

Model Ensemble

Specify the paths of the results in tools/model_fusion_test.py, and run

python tools/model_fusion_test.py

Evaluation

evaluate.py is the officially provided evaluation tool. Run

python evaluate.py ${GT_DIR} ${CSV_Pred_File}

Note: GT_DIR is the path of the original data folder which contains both the image and the GT files. CSV_Pred_File is the path of the final prediction csv file.

Result

Train on Tr00, Tr01, Va00 and Va01, and test on Ts01. Some results are as follows, F1-score

Method embedded isolated total
ResNeSt50-DCN 95.67 97.67 96.03
ResNeSt101-DCN 96.11 97.75 96.41

Our final result, that was ranked 1st place in the competition, was obtained by fusing two Resnest101+GFL models trained with two different random seeds and all labeled data. The final ranking can be seen in our technical report.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{zhong20211st,
  title={1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection},
  author={Zhong, Yuxiang and Qi, Xianbiao and Li, Shanjun and Gu, Dengyi and Chen, Yihao and Ning, Peiyang and Xiao, Rong},
  journal={arXiv preprint arXiv:2107.05534},
  year={2021}
}
@article{GFLli2020generalized,
  title={Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection},
  author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
  journal={arXiv preprint arXiv:2006.04388},
  year={2020}
}
@inproceedings{ATSSzhang2020bridging,
  title={Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection},
  author={Zhang, Shifeng and Chi, Cheng and Yao, Yongqiang and Lei, Zhen and Li, Stan Z},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9759--9768},
  year={2020}
}
@inproceedings{FCOStian2019fcos,
  title={Fcos: Fully convolutional one-stage object detection},
  author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9627--9636},
  year={2019}
}
@article{solovyev2019weighted,
  title={Weighted boxes fusion: ensembling boxes for object detection models},
  author={Solovyev, Roman and Wang, Weimin and Gabruseva, Tatiana},
  journal={arXiv preprint arXiv:1910.13302},
  year={2019}
}
@article{ResNestzhang2020resnest,
  title={Resnest: Split-attention networks},
  author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Lin, Haibin and Zhang, Zhi and Sun, Yue and He, Tong and Mueller, Jonas and Manmatha, R and others},
  journal={arXiv preprint arXiv:2004.08955},
  year={2020}
}
@article{MMDetectionchen2019mmdetection,
  title={MMDetection: Open mmlab detection toolbox and benchmark},
  author={Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and others},
  journal={arXiv preprint arXiv:1906.07155},
  year={2019}
}

Acknowledgements

Owner
yuxzho
yuxzho
This is a Keras implementation of a CNN for estimating age, gender and mask from a camera.

face-detector-age-gender This is a Keras implementation of a CNN for estimating age, gender and mask from a camera. Before run face detector app, expr

Devdreamsolution 2 Dec 04, 2021
A PyTorch implementation of Radio Transformer Networks from the paper "An Introduction to Deep Learning for the Physical Layer".

An Introduction to Deep Learning for the Physical Layer An usable PyTorch implementation of the noisy autoencoder infrastructure in the paper "An Intr

Gram.AI 120 Nov 21, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

Aerial Imagery dataset for fire detection: classification and segmentation using Unmanned Aerial Vehicle (UAV) Title FLAME (Fire Luminosity Airborne-b

79 Jan 06, 2023
Efficient face emotion recognition in photos and videos

This repository contains code of face emotion recognition that was developed in the RSF (Russian Science Foundation) project no. 20-71-10010 (Efficien

Andrey Savchenko 239 Jan 04, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Dec 31, 2022
Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision Project | PDF | Poster Fangyu Li, N. Dinesh Reddy, X

25 Dec 21, 2022
Matthew Colbrook 1 Apr 08, 2022
PiRapGenerator - Make anyone rap the digits of pi

PiRapGenerator Make anyone rap the digits of pi (sample files are of Ted Nivison

7 Oct 02, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

Object DGCNN & DETR3D This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110

Wang, Yue 539 Jan 07, 2023
This project implements "virtual speed" from heart rate monito

ANT+ Virtual Stride Based Speed and Distance Monitor Overview This project imple

2 May 20, 2022
AVD Quickstart Containerlab

AVD Quickstart Containerlab WARNING This repository is still under construction. It's fully functional, but has number of limitations. For example: RE

Carl Buchmann 3 Apr 10, 2022
Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

Facebook Research 75 Dec 19, 2022
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 03, 2023
SpineAI Bilsky Grading With Python

SpineAI-Bilsky-Grading SpineAI Paper with Code 📫 Contact Address correspondence to J.T.P.D.H. (e-mail: james_hallinan AT nuhs.edu.sg) Disclaimer This

<a href=[email protected]"> 2 Dec 16, 2021
From Perceptron model to Deep Neural Network from scratch in Python.

Neural-Network-Basics Aim of this Repository: From Perceptron model to Deep Neural Network (from scratch) in Python. ** Currently working on a basic N

Aditya Kahol 1 Jan 14, 2022
Simulating an AI playing 2048 using the Expectimax algorithm

2048-expectimax Simulating an AI playing 2048 using the Expectimax algorithm The base game engine uses code from here. The AI player is modeled as a m

Subha Ramesh 2 Jan 31, 2022