The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Overview

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

Updates | Introduction | Results | Usage | Citation | Acknowledgment

This is the repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection". I3CL with ViTAEv2, ResNet50 and ResNet50 w/ RegionCL backbone are included.


Updates

[2022/04/13] Publish links of training datasets.

[2022/04/11] Add SSL training code for this implementation.

[2022/04/09] The training code for ICDAR2019 ArT dataset is uploaded. Private github repo temporarily.

Other applications of ViTAE Transformer: Image Classification | Object Detection | Sementic Segmentation | Animal Pose Estimation | Matting | Remote Sensing

Introduction

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a global context module to exploit the semantic context from the shared background, which are able to collaboratively learn more discriminative text feature representation. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Besides, to make full use of the unlabeled data, we design an effective semi-supervised learning method to leverage the pseudo labels via an ensemble strategy. Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e., an F-measure of 77.5% on ArT, 86.9% on Total-Text, and 86.4% on CTW-1500. Notably, our I3CL with the ResNeSt-101 backbone ranked the 1st place on the ArT leaderboard.

image

Results

Example results from paper.

image

Evaluation results of I3CL with different backbones on ArT. Note that: (1) I3CL with ViTAE only adopts one training stage with LSVT+MLT19+ArT training datasets in this repo. ResNet series adopt three training stages, i.e, pre-train on SynthText, mix-train on ReCTS+RCTW+LSVT+MLT19+ArT and lastly finetune on LSVT+MLT19+ArT. (2) Origin implementation of ResNet series is based on Detectron2. The results and model links of ResNet-50 will be updated soon in this implementation.

Backbone Model Link Training Data Recall Precision F-measure

ViTAEv2-S
[this repo]

OneDrive/
百度网盘 (pw:w754)

LSVT,MLT19,ArT 75.4 82.8 78.9

ResNet-50
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 71.3 82.7 76.6

ResNet-50 w/ RegionCL(finetuning)
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 72.6 81.9 77.0

ResNet-50 w/ RegionCL(w/o finetuning)
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 73.5 81.6 77.3

ResNeXt-101
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 74.1 85.5 79.4

ResNeSt-101
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 75.1 86.3 80.3

ResNeXt-151
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 74.9 86.0 80.1

Usage

Install

Prerequisites:

  • Linux (macOS and Windows are not tested)
  • Python >= 3.6
  • Pytorch >= 1.8.1 (For ViTAE implementation). Please make sure your compilation CUDA version and runtime CUDA version match.
  • GCC >= 5
  • MMCV (We use mmcv-full==1.4.3)
  1. Create a conda virtual environment and activate it. Note that this implementation is based on mmdetection 2.20.0 version.

  2. Install Pytorch and torchvision following official instructions.

  3. Install mmcv-full and timm. Please refer to mmcv to install the proper version. For example:

    pip install mmcv-full==1.4.3 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
    pip install timm
    
  4. Clone this repository and then install it:

    git clone https://github.com/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection.git
    cd ViTAE-Transformer-Scene-Text-Detection
    pip install -r requirements/build.txt
    pip install -r requirements/runtime.txt
    pip install -v -e .
    

Preparation

Model:

Data

  • Coco format training datasets are utilized. Some offline augmented ArT training datasets are used. lsvt-test is only used to train SSL(Semi-Supervised Learning) model in paper. Files named train_lossweight.json are the provided pseudo-label for SSL training. You can download correspoding datasets in config file from here and put them in data/:

    Dataset

    Link
    (OneDrive)

    Link
    (Baidu Wangpan百度网盘)

    art Link Link (pw:etif)
    art_light Link Link (pw:mzrk)
    art_noise Link Link (pw:scxi)
    art_sig Link Link (pw:cdk8)
    lsvt Link Link (pw:wly0)
    lsvt_test Link Link (pw:8ha3)
    icdar2019_mlt Link Link (pw:hmnj)
    rctw Link Link (pw:ngge)
    rects Link Link (pw:y00o)

    The file structure should look like:

    |- data
        |- art
        |   |- train_images
        |   |    |- *.jpg
        |   |- test_images
        |   |    |- *.jpg
        |   |- train.json
        |   |- train_lossweight.json
        |- art_light
        |   |- train_images
        |   |    |- *.jpg
        |   |- train.json
        |   |- train_lossweight.json
        ......
        |- lsvt
        |   |- train_images1
        |   |    |- *.jpg
        |   |- train_images2
        |   |    |- *.jpg
        |   |- train1.json
        |   |- train1_lossweight.json
        |   |- train2.json
        |   |- train2_lossweight.json
        |- lsvt_test
        |   |- train_images
        |   |    |- *.jpg
        |   |- train_lossweight.json
        ......
    
    

Training

  • Distributed training with 4GPUs for ViTAE backbone:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_vitae_fpn/i3cl_vitae_fpn_ms_train.py --launcher pytorch --work-dir ./out_dir/${your_dir}
  • Distributed training with 4GPUs for ResNet50 backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_finetune/
  • Distributed training with 4GPUs for ResNet50 w/ RegionCL backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_finetune/

Note:

  • If the GPU memory is limited during training I3CL ViTAE backbone, please adjust img_scale in configuration file. The maximum scale set to (800, 1333) is proper for V100(16G) while there is little effect on the performance actually. Please change the training scale according to your condition.

Inference

For example, use our trained I3CL model to get inference results on ICDAR2019 ArT test set with visualization images, txt format records and the json file for testing submission, please run:

python demo/art_demo.py --checkpoint pretrained_model/I3CL/vitae_epoch_12.pth --score-thr 0.45 --json_file art_submission.json

Note:

  • Upload the saved json file to ICDAR2019-ArT evaluation website for Recall, Precision and F1 evaluation results. Change the path for saving visualizations and txt files if needed.

Citation

This project is for research purpose only.

If you are interested in our work, please consider citing our work. Arxiv

Please post issues to let us know if you encounter any problems.

Acknowledgement

Thanks for mmdetection.

A Python type explainer!

typesplainer A Python typehint explainer! Available as a cli, as a website, as a vscode extension, as a vim extension Usage First, install the package

Typesplainer 79 Dec 01, 2022
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
CVAT is free, online, interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 04, 2023
Mail classification with tensorflow and MS Exchange Server (ham or spam).

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Metin Karatas 1 Sep 11, 2021
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

torch-imle Concise and self-contained PyTorch library implementing the I-MLE gradient estimator proposed in our NeurIPS 2021 paper Implicit MLE: Backp

UCL Natural Language Processing 249 Jan 03, 2023
Code repository for the work "Multi-Domain Incremental Learning for Semantic Segmentation", accepted at WACV 2022

Multi-Domain Incremental Learning for Semantic Segmentation This is the Pytorch implementation of our work "Multi-Domain Incremental Learning for Sema

Pgxo20 24 Jan 02, 2023
🤗 Paper Style Guide

🤗 Paper Style Guide (Work in progress, send a PR!) Libraries to Know booktabs natbib cleveref Either seaborn, plotly or altair for graphs algorithmic

Hugging Face 66 Dec 12, 2022
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

VL-BERT By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. This repository is an official implementation of the paper VL-BERT:

Weijie Su 698 Dec 18, 2022
The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Directed Graph Contrastive Learning Paper | Poster | Supplementary The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL). In this

Tong Zekun 28 Jan 08, 2023
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
Release of the ConditionalQA dataset

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

14 Oct 17, 2022
This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

Over-and-Undervalued-Stocks Of Nepse Using Graham's Number Scrap the latest data using different websites and creates a formatted excel file that high

6 May 03, 2022
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Gen Li 91 Dec 23, 2022
Human-Pose-and-Motion History

Human Pose and Motion Scientist Approach Eadweard Muybridge, The Galloping Horse Portfolio, 1887 Etienne-Jules Marey, Descent of Inclined Plane, Chron

Daito Manabe 47 Dec 16, 2022
PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space" [ICCV 2021].

Christos Tzelepis 100 Dec 06, 2022
CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

Vadim 4 Sep 01, 2022
Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu, Qiang

onion 462 Dec 29, 2022
Classifies galaxy morphology with Bayesian CNN

Zoobot Zoobot classifies galaxy morphology with deep learning. This code will let you: Reproduce and improve the Galaxy Zoo DECaLS automated classific

Mike Walmsley 39 Dec 20, 2022
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

MMChat This repo contains the code and data for the LREC2022 paper MMChat: Multi-Modal Chat Dataset on Social Media. Dataset MMChat is a large-scale d

Silver 47 Jan 03, 2023