The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Last update: Dec 26, 2022

Related tags

Text Data & NLP DocTr

Overview

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

For geometric unwarping, we train the GeoTr network using the Doc3d dataset.
For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

Download the pretrained models here and put them to $ROOT/model_pretrained/.
Geometric unwarping:
```
python inference.py
```
Geometric unwarping and illumination rectification:
```
python inference.py --ill_rec True
```

Evaluation

We use the same evaluation code as DocUNet benchmark dataset based on Matlab 2019a.
Please compare the scores according to your Matlab version.
Use the images available here for reproducing the quantitative performance reported in the paper and further comparison.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Related tags

Overview

DocTr

Training

Inference

Evaluation

Citation

Owner

Hao Feng

Lattice methods in TensorFlow

Sapiens is a human antibody language model based on BERT.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Crie tokens de autenticação íntegros e seguros com UToken.

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Code for the Python code smells video on the ArjanCodes channel.

Question answering app is used to answer for a user given question from user given text.

MPNet: Masked and Permuted Pre-training for Language Understanding

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Shellcode antivirus evasion framework

Translate U is capable of translating the text present in an image from one language to the other.

Curso práctico: NLP de cero a cien 🤗

Non-Autoregressive Predictive Coding

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

A Chinese to English Neural Model Translation Project

PUA Programming Language written in Python.

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".