A Number Recognition algorithm

Last update: Nov 12, 2021

Related tags

Overview

Paddle-VisualAttention

Results_Compared

Methods	Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
PaddlePaddle_SVHNClassifier	54000	GTX 1080 Ti	1024	0.01	100	625	0.9	~1700	95.65%
Pytorch_SVHNClassifier	54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

For more detail, please refer to this blog

Recommended environment

Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
editdistance
visdom
h5py
protobuf
lmdb

Install

Install env

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

Dataset

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

A Number Recognition algorithm

Related tags

Overview

Paddle-VisualAttention

Results_Compared

Introduction

Recommended environment

Install

Install env

Dataset

Usage

Owner

A unet implementation for Image semantic segmentation

thundernet ncnn

Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Predict the latency time of the deep learning models

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Meaningful titles for tabs and PDF downloads! Also supports tab search.

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

v objective diffusion inference code for JAX.

Genetic Programming in Python, with a scikit-learn inspired API

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" https://arxiv.org/abs/2201.13433

An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

基于YoloX目标检测+DeepSort算法实现多目标追踪Baseline