1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Overview

About The Project

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection toolbox based on PyTorch. You can click here for more details about this competition.

Method Description

We built our approach on FCOS, A simple and strong anchor-free object detector, with ResNeSt as our backbone, to detect embedded and isolated formulas. We employed ATSS as our sampling strategy instead of random sampling to eliminate the effects of sample imbalance. Moreover, we observed and revealed the influence of different FPN levels on the detection result. Generalized Focal Loss is adopted to our loss. Finally, with a series of useful tricks and model ensembles, our method was ranked 1st in the MFD task.

Random Sampling(left) ATSS(right) Random Sampling(left) ATSS(right)

Getting Start

Prerequisites

  • Linux or macOS (Windows is in experimental support)
  • Python 3.6+
  • PyTorch 1.3+
  • CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
  • GCC 5+
  • MMCV

This project is based on MMDetection-v2.7.0, mmcv-full>=1.1.5, <1.3 is needed. Note: You need to run pip uninstall mmcv first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be ModuleNotFoundError.

Installation

  1. Install PyTorch and torchvision following the official instructions , e.g.,

    pip install pytorch torchvision -c pytorch

    Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the PyTorch website.

    E.g.1 If you have CUDA 10.1 installed under /usr/local/cuda and would like to install PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.

    pip install pytorch cudatoolkit=10.1 torchvision -c pytorch

    E.g. 2 If you have CUDA 9.2 installed under /usr/local/cuda and would like to install PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.

    pip install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch

    If you build PyTorch from source instead of installing the prebuilt pacakge, you can use more CUDA versions such as 9.0.

  2. Install mmcv-full, we recommend you to install the pre-build package as below.

    pip install mmcv-full==latest+torch1.6.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html

    See here for different versions of MMCV compatible to different PyTorch and CUDA versions. Optionally you can choose to compile mmcv from source by the following command

    git clone https://github.com/open-mmlab/mmcv.git
    cd mmcv
    MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full will be installed after this step
    cd ..

    Or directly run

    pip install mmcv-full
  3. Install build requirements and then compile MMDetection.

    pip install -r requirements.txt
    pip install tensorboard
    pip install ensemble-boxes
    pip install -v -e .  # or "python setup.py develop"

Usage

Data Preparation

Firstly, Firstly, you need to put the image files and the GT files into two separate folders as below.

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
├── img
    ├── 0001125-page02.jpg
    ├── 0001125-page05.jpg
    ├── ...
    └── 0304067-page08.jpg

Secondly, run data_preprocess.py to get coco format label. Remember to change 'img_path', 'txt_path', 'dst_path' and 'train_path' to your own path.

python ./tools/data_preprocess.py

The new structure of data folder will become,

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│
├── gt_icdar
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│   
├── img
│   ├── 0001125-page02.jpg
│   ├── 0001125-page05.jpg
│   ├── ...
│   └── 0304067-page08.jpg
│
└── train_coco.json

Finally, change 'data_root' in ./configs/base/datasets/formula_detection.py to your path.

Train

  1. train with single gpu on ResNeSt50

    python tools/train.py configs/gfl/gfl_s50_fpn_2x_coco.py --gpus 1 --work-dir ${Your Dir}
  2. train with 8 gpus on ResNeSt101

    ./tools/dist_train.sh configs/gfl/gfl_s101_fpn_2x_coco.py 8 --work-dir ${Your Dir}

Inference

Run tools/test_formula.py

python tools/test_formula.py configs/gfl/gfl_s101_fpn_2x_coco.py ${checkpoint path} 

It will generate a 'result' file at the same level with work-dir in default. You can specify the output path of the result file in line 231.

Model Ensemble

Specify the paths of the results in tools/model_fusion_test.py, and run

python tools/model_fusion_test.py

Evaluation

evaluate.py is the officially provided evaluation tool. Run

python evaluate.py ${GT_DIR} ${CSV_Pred_File}

Note: GT_DIR is the path of the original data folder which contains both the image and the GT files. CSV_Pred_File is the path of the final prediction csv file.

Result

Train on Tr00, Tr01, Va00 and Va01, and test on Ts01. Some results are as follows, F1-score

Method embedded isolated total
ResNeSt50-DCN 95.67 97.67 96.03
ResNeSt101-DCN 96.11 97.75 96.41

Our final result, that was ranked 1st place in the competition, was obtained by fusing two Resnest101+GFL models trained with two different random seeds and all labeled data. The final ranking can be seen in our technical report.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{zhong20211st,
  title={1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection},
  author={Zhong, Yuxiang and Qi, Xianbiao and Li, Shanjun and Gu, Dengyi and Chen, Yihao and Ning, Peiyang and Xiao, Rong},
  journal={arXiv preprint arXiv:2107.05534},
  year={2021}
}
@article{GFLli2020generalized,
  title={Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection},
  author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
  journal={arXiv preprint arXiv:2006.04388},
  year={2020}
}
@inproceedings{ATSSzhang2020bridging,
  title={Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection},
  author={Zhang, Shifeng and Chi, Cheng and Yao, Yongqiang and Lei, Zhen and Li, Stan Z},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9759--9768},
  year={2020}
}
@inproceedings{FCOStian2019fcos,
  title={Fcos: Fully convolutional one-stage object detection},
  author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9627--9636},
  year={2019}
}
@article{solovyev2019weighted,
  title={Weighted boxes fusion: ensembling boxes for object detection models},
  author={Solovyev, Roman and Wang, Weimin and Gabruseva, Tatiana},
  journal={arXiv preprint arXiv:1910.13302},
  year={2019}
}
@article{ResNestzhang2020resnest,
  title={Resnest: Split-attention networks},
  author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Lin, Haibin and Zhang, Zhi and Sun, Yue and He, Tong and Mueller, Jonas and Manmatha, R and others},
  journal={arXiv preprint arXiv:2004.08955},
  year={2020}
}
@article{MMDetectionchen2019mmdetection,
  title={MMDetection: Open mmlab detection toolbox and benchmark},
  author={Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and others},
  journal={arXiv preprint arXiv:1906.07155},
  year={2019}
}

Acknowledgements

Owner
yuxzho
yuxzho
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Trash-Sorter-Extraordinaire Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash

Rameen Mahmood 1 Nov 07, 2021
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

BabelCalib: A Universal Approach to Calibrating Central Cameras This repository contains the MATLAB implementation of the BabelCalib calibration frame

Yaroslava Lochman 55 Dec 30, 2022
Joint learning of images and text via maximization of mutual information

mutual_info_img_txt Joint learning of images and text via maximization of mutual information. This repository incorporates the algorithms presented in

Ruizhi Liao 10 Dec 22, 2022
[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (CVPR 2021) arXiv Prerequisite PyTorch = 1.2.0 Python3 torchvision PIL argpar

51 Nov 11, 2022
Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)

Residual Dense Network for Image Super-Resolution This repository is for RDN introduced in the following paper Yulun Zhang, Yapeng Tian, Yu Kong, Bine

Yulun Zhang 494 Dec 30, 2022
基于PaddleClas实现垃圾分类,并转换为inference格式用PaddleHub服务端部署

百度网盘链接及提取码: 链接:https://pan.baidu.com/s/1HKpgakNx1hNlOuZJuW6T1w 提取码:wylx 一个垃圾分类项目带你玩转飞桨多个产品(1) 基于PaddleClas实现垃圾分类,导出inference模型并利用PaddleHub Serving进行服务

thomas-yanxin 22 Jul 12, 2022
[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

Jongmin Lee 17 Nov 10, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 04, 2023
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

54 Oct 15, 2022
MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022
Rank 3 : Source code for OPPO 6G Data Generation Challenge

OPPO 6G Data Generation with an E2E Framework Homepage of OPPO 6G Data Generation Challenge Datasets H1_32T4R.mat H2_32T4R.mat Please put the original

Sen Pei 97 Jan 07, 2023
PyBrain - Another Python Machine Learning Library.

PyBrain -- the Python Machine Learning Library =============================================== INSTALLATION ------------ Quick answer: make sure you

2.8k Dec 31, 2022
Pytorch implementation of the paper "Optimization as a Model for Few-Shot Learning"

Optimization as a Model for Few-Shot Learning This repo provides a Pytorch implementation for the Optimization as a Model for Few-Shot Learning paper.

Albert Berenguel Centeno 238 Jan 04, 2023
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 06, 2023
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
The implementation of 'Image synthesis via semantic composition'.

Image synthesis via semantic synthesis [Project Page] by Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia. Introduction This repository gives

DV Lab 71 Jan 06, 2023
The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

PlantStereo This is the official implementation code for the paper "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction".

Wang Qingyu 14 Nov 28, 2022
tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

tmm_fast tmm_fast or transfer-matrix-method_fast is a lightweight package to speed up optical planar multilayer thin-film device computation. It is es

26 Dec 11, 2022