Simple and understandable swin-transformer OCR project

Last update: Dec 31, 2022

Overview

swin-transformer-ocr

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Owner

Ha YongWook

Decorator for PyMC3

Leveraging OpenAI's Codex to solve cornerstone problems in Music

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Computational Pathology Toolbox developed by TIA Centre, University of Warwick.

Pytorch code for our paper Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains)

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

An implementation of EWC with PyTorch

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demystifying How Self-Supervised Features Improve Training from Noisy Labels

A Lightweight Experiment & Resource Monitoring Tool 📺

A set of examples around hub for creating and processing datasets

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

AI4Good project for detecting waste in the environment

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Yolov3 pytorch implementation

Continuous Security Group Rule Change Detection & Response at scale

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Dense Prediction Transformers