第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字

Overview

OCR

第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)冠军

模型结果

该比赛计算每一个条目的f1score,取所有条目的平均,具体计算方式在这里。这里的计算方式不对一句话里的相同文字重复计算,故f1score比提交的最终结果低:

- train val
f1score 0.9911 0.9582
recall 0.9943 0.9574
precision 0.9894 0.9637

模型说明

  1. 模型

采用densenet结构,模型输入为(64×512)的图片,输出为(8×64×2159)的概率。

将图片划分为多个(8×8)的方格,在每个方格预测2159个字符的概率。

  1. Loss

将(8×64×2159)的概率沿着长宽方向取最大值,得到(2159)的概率,表示这张图片里有对应字符的概率。

balance: 对正例和负例分别计算loss,使得正例loss权重之和与负例loss权重之和相等,解决数据不平衡的问题。

hard-mining

  1. 文字检测 将(8×64×2159)的概率沿着宽方向取最大值,得到(64×2159)的概率。 沿着长方向一个个方格预测文字,然后连起来可得到一句完整的语句。

存在问题:两个连续的文字无法重复检测

下图是一个文字识别正确的示例:的长为半径作圆

下图是一个文字识别错误的示例:为10元;经粗加工后销售,每

文件目录

ocr
|
|--code
|
|--files
|	|
|	|--train.csv
|
|--data
	|
	|--dataset
	|	|
	|	|--train
	|	|
	|	|--test
	|
	|--result
	|	|
	|	|--test_result.csv
	|
	|--images		此文件夹放置任何图片均可,我放的celebA数据集用作pretrain

运行环境

Ubuntu16.04, python2.7, CUDA9.0

安装pytorch, 推荐版本: 0.2.0_3

pip install -r requirement.txt

下载数据

这里下载初赛、复赛数据、模型,合并训练集、测试集。

预处理

如果不更换数据集,不需要执行这一步。

如果更换其他数据集,一并更换 files/train.csv

cd code/preprocessing
python map_word_to_index.py
python analysis_dataset.py  

训练

cd code/ocr
python main.py

测试

f1score在0.9以下,lr=0.001,不使用hard-mining;

f1score在0.9以上,lr=0.0001,使用hard-mining;

生成的model保存在不同的文件夹里。

cd code/ocr
python main.py --phase test --resume  ../../data/models-small/densenet/eval-16-1/best_f1score.ckpt
Owner
尹畅
Ph.D. in CSE Research interests: deep learning, active learning, medical application
尹畅
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 06, 2022
Virtual Zoom Gesture using OpenCV

Virtual_Zoom_Gesture I have created a virtual zoom gesture where we can Zoom in and Zoom out any image and even we can move that image anywhere on the

Mudit Sinha 2 Dec 26, 2021
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
OCR of Chicago 1909 Renumbering Plan

Requirements: Python 3 (probably at least 3.4) pipenv (pip3 install pipenv) tesseract (brew install tesseract, at least if you have a mac and homebrew

ted whalen 2 Nov 21, 2021
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

Chien-yu Huang 27 Jun 16, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
Face Recognizer using Opencv Python

Face Recognizer using Opencv Python The first step create your own dataset with file open-cv-create_dataset second step You can put the photo accordin

Han Izza 2 Nov 16, 2021
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Security Camera using Opencv & Dropbox This is a simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dro

Arpit Rath 1 Jan 31, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023