A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Accuracy
54000 GTX 1080 Ti 512 0.16 100 625 0.9 ~1700 95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

  • Python 3.6

  • torch 1.0

  • torchvision 0.2.1

  • visdom

    $ pip install visdom
    
  • h5py

    In Ubuntu:
    $ sudo apt-get install libhdf5-dev
    $ sudo pip install h5py
    
  • protobuf

    $ pip install protobuf
    
  • lmdb

    $ pip install lmdb
    

Setup

  1. Clone the source code

    $ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
    $ cd SVHNClassifier-PyTorch
    
  2. Download SVHN Dataset format 1

  3. Extract to data folder, now your folder structure should be like below:

    SVHNClassifier
        - data
            - extra
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - test
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - train
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
    

Usage

  1. (Optional) Take a glance at original images with bounding boxes

    Open `draw_bbox.ipynb` in Jupyter
    
  2. Convert to LMDB format

    $ python convert_to_lmdb.py --data_dir ./data
    
  3. (Optional) Test for reading LMDBs

    Open `read_lmdb_sample.ipynb` in Jupyter
    
  4. Train

    $ python train.py --data_dir ./data --logdir ./logs
    
  5. Retrain if you need

    $ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth
    
  6. Evaluate

    $ python eval.py --data_dir ./data ./logs/model-100.pth
    
  7. Visualize

    $ python -m visdom.server
    $ python visualize.py --logdir ./logs
    
  8. Infer

    $ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png
    
  9. Clean

    $ rm -rf ./logs
    or
    $ rm -rf ./logs_retrain
    
Owner
Potter Hsu
Potter Hsu
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

Ananya Gupta 1 Jan 15, 2022
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

Yue Gao 102 Jan 02, 2023
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
PyTorch-centric library for evaluating and enhancing the robustness of AI technologies

Responsible AI Toolbox A library that provides high-quality, PyTorch-centric tools for evaluating and enhancing both the robustness and the explainabi

24 Dec 22, 2022
This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models This repository contains the code for our paper VDA (publ

RUCAIBox 13 Aug 06, 2022
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

Da Yin 24 Oct 13, 2022
Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

DSPoint Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion" Coming soon, as soon as I finish a

Ziyao Zeng 14 Feb 26, 2022
Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3

Python-for-Epidemiologists This repository is an introduction to epidemiology analyses in Python. Additionally, the tutorials for my library zEpid are

Paul Zivich 120 Nov 17, 2022
[CIKM 2021] Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. This repo contains the PyTorch code and implementation for the paper E

Akuchi 18 Dec 22, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 160 Jan 07, 2023
Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

Fan Yang 346 Dec 30, 2022
A framework for multi-step probabilistic time-series/demand forecasting models

JointDemandForecasting.py A framework for multi-step probabilistic time-series/demand forecasting models File stucture JointDemandForecasting contains

Stanford Intelligent Systems Laboratory 3 Sep 28, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022
A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector. Eval python wider_eval_pytorch.

carwin 235 Jan 07, 2023
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

Michael Janner 269 Jan 05, 2023
Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022
Code for Massive-scale Decoding for Text Generation using Lattices

Massive-scale Decoding for Text Generation using Lattices Jiacheng Xu, Greg Durrett TL;DR: a new search algorithm to construct lattices encoding many

Jiacheng Xu 37 Dec 18, 2022