Simple and understandable swin-transformer OCR project

Last update: Dec 31, 2022

Overview

swin-transformer-ocr

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Owner

Ha YongWook

Supporting code for the paper "Dangers of Bayesian Model Averaging under Covariate Shift"

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Privacy-Preserving Portrait Matting [ACM MM-21]

OpenMMLab Image Classification Toolbox and Benchmark

You Only Look Once for Panopitic Driving Perception

Classification models 1D Zoo - Keras and TF.Keras

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

Tools for investing in Python

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow