The tutorial is a collection of many other resources and my own notes

Overview
# TOC

Before reading
the tutorial is a collection of many other resources and my own notes. Note that the ref if any in the tutorial means the whole passage. And part to be referred if any means the part has been summarized or detailed by me. Feel free to click the [the part to be referred] to read the original.

CTC_pytorch

1. Why we need CTC? ---> looking back on history

Feel free to skip it if you already know the purpose of CTC coming into being.

1.1. About CRNN

We need to learn CRNN because in the context we need an output to be a sequence.

ref: the overview from CRNN to CTC !! highly recommended !!

part to be referred

multi-digit sequence recognition

  • Characted-based
  • word-based
  • sequence-to-sequence
  • CRNN = CNN + RNN
    • CNN --> relationship between pixel
    • (the small fonts) Specifially, each feature vec of a feature seq is generated from left to right on the feature maps. That means the i-th feature vec is the concatenation of the columns of all the maps. So the shape of the tensor can be reshaped as e.g. (batch_size, 32, 256)

image1



1.2. from Cross Entropy Loss to CTC Loss

Usually, CE is applied to compute loss as the following way. And gt(also target) can be encoded as a stable matrix or vector.

image2

However, in OCR or audio recognition, each target input/gt has various forms. e.g. "I like to play piano" can be unpredictable in handwriting.

image3

Some stroke is longer than expected. Others are short.
Assume that the above example is encoded as number sequence [5, 3, 8, 3, 0].

image4

  • Tips: blank(the blue box symbol here) is introduced because we allow the model to predict a blank label due to unsureness or the end comes, which is similar with human when we are not pretty sure to make a good prediction. ref:lihongyi lecture starting from 3:45

Therefore, we see that this is an one-to-many question where e.g. "I like to play piano" has many target forms. But we not just have one sequence. We might also have other sequence e.g. "I love you", "Not only you but also I like apple" etc, none of which have a same sentence length. And this is what cross entropy cannot achieve in one batch. But now we can encode all sequences/sentences into a new sequence with a max length of all sequences.

e.g.
"I love you" --> len = 10
"How are you" --> len = 11
"what's your name" --> len = 16

In this context the input_length should be >= 16.

For dealing with the expanded targets, CTC is introduced by using the ideas of (1) HMM forward algorithm and (2) dynamic programing.

2. Details about CTC

2.1. intuition: forward algorithm

image5

image6

Tips: the reason we have - inserted between each two token is because, for each moment/horizontal(Note) position we allow the model to predict a blank representing unsureness.

Note that moment is for audio recognition analogue. horizontal position is for OCR analogue.



2.2. implementation: forward algorithm with dynamic programming

the complete code is CTC.py

given 3 samples, they are
"orange" :[15, 18, 1, 14, 7, 5]    len = 6
"apple" :[1, 16, 16, 12, 5]    len = 5
"watermelon" :[[23, 1, 20, 5, 18, 13, 5, 12, 15, 14]  len = 10

{0:blank, 1:A, 2:B, ... 26:Z}

2.2.1. dummy input ---> what the input looks like

# ------------ a dummy input ----------------
log_probs = torch.randn(15, 3, 27).log_softmax(2).detach().requires_grad_()# 15:input_length  3:batchsize  27:num of token(class)
# targets = torch.randint(0, 27, (3, 10), dtype=torch.long)
targets = torch.tensor([[15, 18, 1,  14, 7, 5,  0, 0,  0,  0],
                        [1,  16, 16, 12, 5, 0,  0, 0,  0,  0],
                        [23, 1,  20, 5, 18, 13, 5, 12, 15, 14]]
                        )

# assume that the prediction vary within 15 input_length.But the target length is still the true length.
""" 
e.g. [a,0,0,0,p,0,p,p,p, ...l,e] is one of the prediction
 """
input_lengths = torch.full((3,), 15, dtype=torch.long)
target_lengths = torch.tensor([6,5,10], dtype = torch.long)



2.2.2. expand the target ---> what the target matrix look like

Recall that one target can be encoded in many different forms. So we introduce a targets mat to represent it as follows.

"-d-o-g-" ">
target_prime = targets.new_full((2 * target_length + 1,), blank) # create a targets_prime full of zero

target_prime[1::2] = targets[i, :target_length] # equivalent to insert blanks in targets. e.g. targets = "dog" --> "-d-o-g-"

Now we got target_prime(also expanded target) for e.g. "apple"
target_prime is
tensor([ 0, 1, 0, 16, 0, 16, 0, 12, 0, 5, 0]) which is visualized as the red part(also t1)

image7

Note that the t8 is only for illustration. In the example, the width of target matrix should be 15(input_length).

probs = log_probs[:input_length, i].exp()

Then we convert original inputs from log-space like this, referring to "In practice, the above recursion ..." in original paper https://www.cs.toronto.edu/~graves/icml_2006.pdf

2.3. Alpha Matrix

image8

# alpha matrix init at t1 indicated by purple boxes.
alpha_col = log_probs.new_zeros((target_length * 2 + 1,))
alpha_col[0] = probs[0, blank] # refers to green box
alpha_col[1] = probs[0, target_prime[1]]
  • blank is the index of blank(here it's 0)
  • target_prime[1] refers to the 1-st index of the token. e.g. "apple": "a", "orange": "o"

2.4. Dynamic programming based on 3 conditions

refer to the details in CTC.py

reference:

Owner
手写AI
手写AI
CoderByte | Practice, Tutorials & Interview Preparation Solutions|

CoderByte | Practice, Tutorials & Interview Preparation Solutions This repository consists of solutions to CoderByte practice, tutorials, and intervie

Eda AYDIN 6 Aug 09, 2022
FireEye Related Projects

FireEye FireEye Related Projects Tor-IP-Collector Simple python script that will collect a list of TOR IPs from the SecOps Institute Github and inject

Taran Ulrich 2 Nov 12, 2022
Python For Finance Cookbook - Code Repository

Python For Finance Cookbook - Code Repository

Packt 544 Dec 25, 2022
Course Materials for Math 340

UBC Math 340 Materials This repository aims to be the one repository for which you can find everything you about Math 340. Lecture Notes Lecture Notes

2 Nov 25, 2021
💡 Catatan Materi Bahasa Pemrogramman Python

Repository catatan kuliah Andika Tulus Pangestu selama belajar Dasar Pemrograman dengan Python.

0 Oct 10, 2021
Soccerdata - Efficiently scrape soccer data from various sources

SoccerData is a collection of wrappers over soccer data from Club Elo, ESPN, FBr

Pieter Robberechts 195 Jan 04, 2023
30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days

30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than100 days, follow your own pace.

Asabeneh 17.7k Jan 07, 2023
Create Python API documentation in Markdown format.

Pydoc-Markdown Pydoc-Markdown is a tool and library to create Python API documentation in Markdown format based on lib2to3, allowing it to parse your

Niklas Rosenstein 375 Jan 05, 2023
Python Advanced --- numpy, decorators, networking

Python Advanced --- numpy, decorators, networking (and more?) Hello everyone 👋 This is the project repo for the "Python Advanced - ..." introductory

Andreas Poehlmann 2 Nov 05, 2021
NoVmpy - NoVmpy with python

git clone -b dev-1 https://github.com/wallds/VTIL-Python.git cd VTIL-Python py s

263 Dec 23, 2022
Count the number of lines of code in a directory, minus the irrelevant stuff

countloc Simple library to count the lines of code in a directory (excluding stuff like node_modules) Simply just run: countloc node_modules args to

Anish 4 Feb 14, 2022
Data science on SDGs - Udemy Online Course Material: Data Science on Sustainable Development Goals

Data Science on Sustainable Development Goals (SDGs) Udemy Online Course Material: Data Science on Sustainable Development Goals https://bit.ly/data_s

Frank Kienle 1 Jan 04, 2022
NetBox plugin that stores configuration diffs and checks templates compliance

Config Officer - NetBox plugin NetBox plugin that deals with Cisco device configuration (collects running config from Cisco devices, indicates config

77 Dec 21, 2022
A set of Python libraries that assist in calling the SoftLayer API.

SoftLayer API Python Client This library provides a simple Python client to interact with SoftLayer's XML-RPC API. A command-line interface is also in

SoftLayer 155 Sep 20, 2022
A complete kickstart devcontainer repository for python3

A complete kickstart devcontainer repository for python3

Viktor Freiman 3 Dec 23, 2022
Python Deep Dive Course - Accompanying Materials

Python Deep Dive Various Jupyter notebooks and Python sources associated with my Udemy Python 3 Deep Dive course series: Part 1: Mainly functional pro

Fred Baptiste 1.1k Dec 30, 2022
Easy OpenAPI specs and Swagger UI for your Flask API

Flasgger Easy Swagger UI for your Flask API Flasgger is a Flask extension to extract OpenAPI-Specification from all Flask views registered in your API

Flasgger 3.1k Dec 24, 2022
A swagger tool for tornado, using python to write api doc!

SwaggerDoc About A swagger tool for tornado, using python to write api doc! Installation pip install swagger-doc Quick Start code import tornado.ioloo

aaashuai 1 Jan 10, 2022
Docov - Light-weight, recursive docstring coverage analysis for python modules

docov Light-weight, recursive docstring coverage analysis for python modules. Ov

Richard D. Paul 3 Feb 04, 2022
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Ram Seshadri 6 Oct 02, 2022