Some bits of javascript to transcribe scanned pages using PageXML

Overview

nashi (nasḫī)

Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's more: download now and get a complete webapp written in Python/Flask that handles import and export of your scanned pages to and from LAREX for semi-automatic layout analysis, does the line segmentation for you (via kraken) and saves your precious PageXML in a database. All you've got to do is follow the instructions below and help me implement all the missing features... OCR training and recognition is currently not included because of our webhost's limited capacity.

Instructions for nashi.html

  • Put nashi.html in a folder with (or some folder above) your PageXML files (containing line segmentation data) and the page images. Serve the folder in a webserver of your choice or simply use the file:// protocol (only supported in Firefox at the moment).
  • In the browser, open the interface as .../path/to/nashi.html?pagexml=Test.xml&direction=rtl where Test.xml (or subfolder/Test.xml) is one of the PageXML files and rtl (or ltr) indicates the main direction of your text.
  • Install the "Andron Scriptor Web" font to use the additional range of characters.

The interface

  • Lines without existing text are marked red, lines containing OCR data blue and lines already transcribed are coloured green.

Keyboard shortcuts in the text input area

  • Tab/Shift+Tab switches to the next/previous input.
  • Shift+Enter saves the edits for the current line.
  • Shift+Insert shows an additional range of characters to select as an alternative to the character next to the cursor. Input one of them using the corresponding number while holding Insert.
  • Shift+ArrowDown opens a new comment field (Shift+ArrowUp switches back to the transcription line).

Global keyboard shortcuts

  • Ctrl+Space Zooms in to line width
  • Ctrl+Shift+Space toggles zoom mode (always zoom in to line width)
  • Shift+PageUp/PageDown loads the next/previous page if the filenames of your PageXML files contain the number.
  • Ctrl+Shift+ArrowLeft/ArrowRight changes orientation and input direction to ltr/rtl.
  • Ctrl+S downloads the PageXML file.
  • Ctrl+E enters or exits polygon edit mode.

Edit mode

  • Click on line area to activate point handles. Points can be moved around using, new points can be created by drawing the borders between existing points.
  • If points or lines are active, they can be deleted using the "Delete"-key.
  • Hold Shift-key and draw to select multiple points
  • New text lines can be created by clicking inside an existing text region and drawing a rectangle. New lines are always added at the end of the region.

Instructions for the server

  • Install redis. The app uses celery as a task queue for line segmentation jobs (and probably OCR jobs in the future).
  • Install LAREX for semi-automatic layout analysis.
  • Install the server from this repository or from pypi:
pip install nashi
  • Create a config.py file. For more options see the file default_settings.py. If you want the app to send emails to users, change the mail settings there. Here is just a minimal example:
BOOKS_DIR = "/home/username/books/"
LAREX_DIR = "/home/username/larex_books/"
  • Set an environment variable containing your database url. If you don't, nashi will create a sqlite database called "test.db" in your working directory.
export DATABASE_URL="mysql+pymysql://user:[email protected]/mydb?charset=utf8"
  • Create the database tables (and users, if needed) from a python prompt. Login is disabled in the default config file.
from nashi import user_datastore
from nashi.database import db_session, init_db
init_db()
user_datastore.create_user(email="[email protected]", password="secret")
db_session.commit()
  • Run the celery worker:
export NASHI_SETTINGS=/home/user/path/to/config.py
celery -A nashi.celery worker --loglevel=info
  • Run the app, don't forget to export your DATABASE_URl again if you're using a new terminal:
export FLASK_APP=nashi
export NASHI_SETTINGS=/home/user/path/to/config.py
flask run
  • Open localhost:5000, log in, update your books list via "Edit, Refresh".

Planned features

  • Sorting of lines
  • Reading order
  • Creation and correction of regions
  • API for external OCR service
  • Advanced text editing capabilities
  • Help, examples, and documentation
  • Artificial general intelligence that writes the code for me
Owner
Andreas Büttner
Andreas Büttner
A simple demo program for using OpenCV on Android

Kivy OpenCV Demo A simple demo program for using OpenCV on Android Build with: buildozer android debug deploy run Run (on desktop) with: python main.p

Andrea Ranieri 13 Dec 29, 2022
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 04, 2023
7th place solution

SIIM-FISABIO-RSNA-COVID-19-Detection 7th place solution Validation: We used iterative-stratification with 5 folds (https://github.com/trent-b/iterativ

11 Jul 17, 2022
Super Mario Game With Python

Super_Mario Hello all this is a simple python program which tries to use our body as a controller for the super mario game Here I have used media pipe

Adarsh Badagala 219 Nov 25, 2022
Resizing Canny Countour In Python

Resizing_Canny_Countour Install Visual Studio Code , https://code.visualstudio.com/download Select Python and install with terminal( pip install openc

Walter Ng 1 Nov 07, 2021
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Jan Zdenek 43 Mar 15, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
This repository contains codes on how to handle mouse event using OpenCV

Handling-Mouse-Click-Events-Using-OpenCV This repository contains codes on how t

Happy N. Monday 3 Feb 15, 2022
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Convolutional Recurrent Neural Network This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC l

Baoguang Shi 2k Dec 31, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
Controlling Volume by Hand Gestures

This program allows the user to control the volume of their device with specific hand gestures involving their thumb and index finger!

Riddhi Bajaj 1 Nov 11, 2021
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
aardio的opencv库

opencv_aardio dll库下载地址:https://github.com/xuncv/opencv-plugin/releases import cv2 img = cv2.imread("./images/Lena.jpg",1) img = cv2.medianBlur(img,5)

71 Dec 31, 2022