Provides OCR (Optical Character Recognition) services through web applications

Overview

OCR4all

Build Status

As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety of historical printings and obtain high quality results with reasonable time expenditure. Therefore, OCR4all is explicitly geared towards users with no technical background. If you are one of those users (or if you just want to use the tool and are not interested in the code), please go to the getting started project where you will find guides and test data.

Please note that OCR4all current main focus is a semi-automatic workflow allowing users to perform OCR even on the earliest printed books, which is a very challenging task that often requires a significant amount of manual interaction, especially when almost perfect quality is desired. Nevertheless, we are working towards increasing robustness and the degree of automation of the tool. An important cornerstone for this is the recently agreed cooperation with the OCR-D project which focuses on the mass full-text recognition of historical materials.

This repository contains the code for the main interface and server of the OCR4all project, while the repositories OCR4all/docker_image and OCR4all/docker_base_image are about the creation of a preconfigurated docker image.

For installing the complete project with a docker image, please follow the instructions here.

Mailing List

OCR4all is under active development and consequently, frequent releases containing bug fixes and further functionality can be expected. In order to always be up to date, we highly recommend subscribing to our mailing list where we will always announce notable enhancements.

Built With

Included Projects

  • OCRopus - Collection of document analysis programs
  • calamari - OCR Engine based on OCRopy and Kraken
  • LAREX - Layout analysis on early printed books

Formerly included / inspired by

  • Kraken - OCR engine for all the languages
  • nashi - Some bits of javascript to transcribe scanned pages using PageXML

Contact, Authors, and Helping Hands

Developers

  • Dr. Herbert Baier Saip (lead)
  • Maximilian Nöth (OCR4all, LAREX, and Calamari)
  • Christoph Wick (Calamari)
  • Andreas Büttner (Calamari and nashi)
  • Kevin Chadbourne (OCR4all and LAREX)
  • Yannik Herbst (OCR4all, LAREX, and distribution via VirtualBox)
  • Björn Eyselein (Artifactory and distribution via Docker)

Miscellaneous

  • Raphaëlle Jung (guides and artwork)
  • Dr. Uwe Springmann (ideas and feedback)
  • Prof. Dr. Frank Puppe (funding)

Former Project Members

  • Dennis Christ (OCR4all)
  • Alexander Hartelt (OCR4all)
  • Nico Balbach (OCR4all and LAREX)
  • Christine Grundig (ideas and feedback)
  • ...

Funding

Comments
  • Recognition claims it is finished but does nothing

    Recognition claims it is finished but does nothing

    I am trying to use OCR4all 0.5.0 via Docker on two different workstations.

    On one workstation everything is working fine, on the other the Recognition step finishes after several seconds but generates no results. This is reproducible with the example projects from the getting started repository using the default settings.

    • The console output tab shows:
    Found 109 files in the dataset
    Checkpoint version 2 is up-to-date.
    
    • The console error tab stays empty.
    • The browser console log is unsuspicious.
    • Tomcat's catalina.log is unsuspicious.

    Any ideas why this is happening or hints on other log files with more information?


    One theory is that the second workstation has not enough CPUs (2 available to Docker) to support Calamari. RAM (12 GB available to Docker) should not be an issue.

    opened by b2m 16
  • Missing Dockerfile

    Missing Dockerfile

    The README points to a Dockerfile in the master branch to build and launch the project, but the file has been removed in 134169af50a956d43d4c6aba91152fdb2a718c84. What is the status, and how do we use the project?

    opened by raphink 9
  • Segment/region detection comparison

    Segment/region detection comparison

    If ocrd-pc-segmenter is the same as the segmentation algorithm in ocr4all, this comparison might interesting to you: https://digi.ub.uni-heidelberg.de/diglitData/v/testset-ls-v3.pdf

    opened by jbarth-ubhd 8
  • Line Segmentation - Console Error Message

    Line Segmentation - Console Error Message

    5 von 70 Dateien wollen nicht durch die Line Segmentation, die folgende Fehlermeldung taucht auf, leider kann ich daraus nichts ersehen, was ich verändern muss damit es gehen könnte:

    • Alle Dateien sind von Hand mit LAREX Segmentiert worden
    Traceback (most recent call last):
      File "/usr/local/bin/pagelineseg", line 33, in 
        sys.exit(load_entry_point('ocr4all-helpers==0.2.2', 'console_scripts', 'pagelineseg')())
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 634, in cli
        pool.map(parallel, dataset)
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
        raise self._value
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
        result = (True, func(*args, **kwds))
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
        return list(map(*args))
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 625, in parallel
        remove_images=args.remove_images)
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 304, in pagexmllineseg
        root = etree.parse(xmlfile).getroot()
      File "src/lxml/etree.pyx", line 3521, in lxml.etree.parse
      File "src/lxml/parser.pxi", line 1859, in lxml.etree._parseDocument
      File "src/lxml/parser.pxi", line 1885, in lxml.etree._parseDocumentFromURL
      File "src/lxml/parser.pxi", line 1789, in lxml.etree._parseDocFromFile
      File "src/lxml/parser.pxi", line 1177, in lxml.etree._BaseParser._parseDocFromFile
      File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
      File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
      File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
      File "/var/ocr4all/data/testset/processing/0064.xml", line 1
    lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 55
    

    @jbarth-ubhd

    opened by lsubhd 7
  • TypeError: [

    TypeError: ["'", 'e', ':', '?'] has type list, but expected one of: bytes, unicode

    Hello! I'm using calamari from within the new OCR4all tool under Linux Mint 19.1 Tessa. The OCR process stops with the following error:

    Traceback (most recent call last):
      File "/usr/local/bin/calamari-predict", line 11, in 
        load_entry_point('calamari-ocr==0.3.1', 'console_scripts', 'calamari-predict')()
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/scripts/predict.py", line 151, in main
        run(args)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/scripts/predict.py", line 74, in run
        prediction = voter.vote_prediction_result(result)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/ocr/voting/voter.py", line 19, in vote_prediction_result
        return self.vote_prediction_result_tuple(tuple(prediction_results))
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/ocr/voting/voter.py", line 45, in vote_prediction_result_tuple
        p.sentence = [c for c, _ in sv.process_text(sentences)]
    TypeError: ["'", 'e', ':', '?'] has type list, but expected one of: bytes, unicode
    

    What can I do in order to avoid the TypeError?

    Many thanks!

    opened by alexander-winkler 7
  • Error during Line-Segmentation || TypeError: '<' not supported between instances of 'Image' and 'float'

    Error during Line-Segmentation || TypeError: '<' not supported between instances of 'Image' and 'float'

    Dear all,

    once again I have come up with a problem during line segmentation:

    File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 634, in cli pool.map(parallel, dataset) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 625, in parallel remove_images=args.remove_images) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 369, in pagexmllineseg cropped = Image.fromarray(nlbin.adaptive_binarize(np.array(cropped)).astype(np.uint8)) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/lib/nlbin.py", line 47, in adaptive_binarize extreme = (np.sum(image<0.05)+np.sum(image>0.95))*1.0/np.prod(image.shape) TypeError: '<' not supported between instances of 'Image' and 'float'

    Is there any further information you need?

    Regards, Leonie

    opened by lsubhd 6
  • Using calamari models in OCR4all

    Using calamari models in OCR4all

    Hello! I have trained a calamari model using the calamari-train (v0.3.5) command. Since I'd like to use OCR4all in order to keep track of the project I tried to copy the model into the ocr4 models-directory

    project/
    └── 0
        ├── 0.ckpt.data-00000-of-00001
        ├── 0.ckpt.index
        ├── 0.ckpt.json
        ├── 0.ckpt.meta
        ├── checkpoint
    

    During the recognition process I get the following error message:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 547, in _ConvertFieldValuePair
        self.ConvertMessage(value, sub_message)
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 452, in ConvertMessage
        self._ConvertFieldValuePair(value, message)
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 552, in _ConvertFieldValuePair
        raise ParseError('Failed to parse {0} field: {1}'.format(name, e))
    google.protobuf.json_format.ParseError: Failed to parse network field: Failed to parse backend field: Message type "BackendParams" has no field named "shuffleBufferSize".
     Available Fields(except extensions): 
    

    Is there a way to use externally trained models in OCR4all? Thanks in advance!

    opened by alexander-winkler 6
  • Error after training run

    Error after training run

    The following error message was the result of trying a training (based on fraktur historical + GT):

    training0_fehlermeldung_2020-12-01.txt

    IT is wondering: what ocr4all is doing in Docker in /Tmp/?

    Error message WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.757118). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.680600). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.302509). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.631931). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.153755). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.732034). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.661815). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.718801). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.743117). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.712587). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.963019). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.834908). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.744983). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.694347). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.802638). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.771759). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.675146). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.636742). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.748071). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.423025). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.787484). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.808512). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.172735). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.214929). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.159862). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.875830). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.835742). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.827941). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.780199). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.717965). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.834584). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.468633). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.007760). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.707687). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.761901). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (19.359528). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (24.881613). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (7.594938). Check your callbacks. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 save_weights_to_hdf5_group(model_weights_group, model_layers) save_weights_to_hdf5_group(model_weights_group, model_layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in __setitem__ File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_00000453.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xbb35e60, total write size = 179216, bytes this sub-write = 179216, bytes actually written = 18446744073709551615, offset = 5828608)

    During handling of the above exception, another exception occurred:

    OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_00000438.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xaba6980, total write size = 265232, bytes this sub-write = 265232, bytes actually written = 18446744073709551615, offset = 5742592)Traceback (most recent call last):

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start yield yield File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook batch_hook(batch, logs) batch_hook(batch, logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end self.on_batch_end(batch, logs=logs) self.on_batch_end(batch, logs=logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close h5i.dec_ref(id) h5i.dec_ref(id) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_00000438.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x9ca6b70, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_00000453.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xb64dbe0, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)

    During handling of the above exception, another exception occurred:

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run progress_bar=not args.no_progress_bars progress_bar=not args.no_progress_bars File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train self._run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in _run_train self.run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in run_train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train v_cb, es_cb v_cb, es_cb File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 108, in h5py.h5f.create File "h5py/h5f.pyx", line 108, in h5py.h5f.create OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x7182fa8, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0)OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x67ecef8, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0)

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 save_weights_to_hdf5_group(model_weights_group, model_layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in setitem self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_00000481.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xb465720, total write size = 49040, bytes this sub-write = 49040, bytes actually written = 18446744073709551615, offset = 2367488)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start yield File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook batch_hook(batch, logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end self.on_batch_end(batch, logs=logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close h5i.dec_ref(id) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_00000481.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xacbac40, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run progress_bar=not args.no_progress_bars File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train self._run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in run_train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train v_cb, es_cb File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 108, in h5py.h5f.create OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x5e02c28, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/cross_fold_trainer.py", line 27, in train_individual_model ], args.get("run", None), {"threads": args.get('num_threads', -1)}), verbose=args.get("verbose", False)): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/utils/multiprocessing.py", line 87, in run raise Exception("Error: Process finished with code {}".format(process.returncode)) Exception: Error: Process finished with code -11 """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/usr/local/bin/calamari-cross-fold-train", line 33, in sys.exit(load_entry_point('calamari-ocr==1.0.5', 'console_scripts', 'calamari-cross-fold-train')()) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/cross_fold_train.py", line 80, in main temporary_dir=args.temporary_dir, keep_temporary_files=args.keep_temporary_files, File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/cross_fold_trainer.py", line 151, in run pool.map_async(train_individual_model, run_args).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value Exception: Error: Process finished with code -11

    As a result: no recognition process is running anymore - not for the trained project, nor for the others loaded in ocr4all - with the following error:

    ihxx_recognition_eigenesModell

    opened by lsubhd 5
  • Error in Line Segmentation and GTProduction

    Error in Line Segmentation and GTProduction

    Hi, I use the version of ocr4all is 0.4.0 with the same version of Larex and we mount it to a server. I have a problem, I comment better:

    Perform all the Process Flow of all the pages I did the GT Production through Larex of all the pages In Project Overview, I cross-mark the Line Segmentation and GT columns of some pages. I do the process flow only of those specific pages Mark me with a check mark up to the Line Segmentation column I do GT of those specific pages I go back to Project Overview, and I cross-mark the Line Segmentation and the GT of those pages Another thing, do 2 times the process flow of all the pages and they are specific pages. Another thing I realized is in the generation of XML because I get this error for example if I want to do a training:

    File "src / lxml / parser.pxi", line 654, in lxml.etree._raiseParseError File "/var/ocr4all/data/Rodrigo/processing/0006.xml", line 1 lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 55

    I hope someone had a similar problem.

    Greetings from Argentina!

    opened by emanuel-22 5
  • ValueError: zero-size array to reduction operation maximum which has no identity

    ValueError: zero-size array to reduction operation maximum which has no identity

    Hello! During the recognition process the following error is thrown and the recognition effectively stops proceeding, while the Status still reads "Status: ERROR: The process is still running" (OCR4all ver 0.3.0, LAREX ver 0.3.1):

    Process ForkProcess-3:
    Traceback (most recent call last):
      File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/datasets/input_dataset.py", line 99, in run
        out = self.apply_single(*data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/datasets/input_dataset.py", line 119, in apply_single
        line, params = self.params.data_processor.apply([line], 1, False)[0]
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 19, in apply
        processes=processes, progress_bar=progress_bar, max_tasks_per_child=max_tasks_per_child)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/utils/multiprocessing.py", line 32, in parallel_map
        out = list(map(f, d))
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 50, in _apply_single
        data, params = proc._apply_single(data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 50, in _apply_single
        data, params = proc._apply_single(data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/center_normalizer.py", line 15, in _apply_single
        out, params = self.normalize(data, cval=np.amax(data))
      File "<__array_function__ internals>", line 6, in amax
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2668, in amax
        keepdims=keepdims, initial=initial, where=where)
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
        return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    ValueError: zero-size array to reduction operation maximum which has no identity
    

    I have attached the PAGEXML and image files where the error occurs. I couldn't find anything suspicious here.

    0125 nrm 0125 bin 0125.txt

    opened by alexander-winkler 5
  • Word-based PageXML Output / Wortweise PageXML Ebene in der Ausgabe

    Word-based PageXML Output / Wortweise PageXML Ebene in der Ausgabe

    Liebe Kollegen,

    es würde uns sehr helfen wenn künftig die PageXML-Daten der Ausgabe eine Wortebene enthielten.

    Vielen Dank für Eure Mühe im Voraus!

    Schöne Grüße aus München Florian Landes

    opened by FLE92 5
  • Bump spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE

    Bump spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE

    Bumps spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE.

    Release notes

    Sourced from spring-webmvc's releases.

    v5.2.20.RELEASE

    :star: New Features

    • Restrict access to property paths on Class references #28262
    • Improve diagnostics in SpEL for large array creation #28257

    v5.2.19.RELEASE

    :star: New Features

    • Declare serialVersionUID on DefaultAopProxyFactory #27785
    • Use ByteArrayDecoder in DefaultClientResponse::createException #27667

    :lady_beetle: Bug Fixes

    • ProxyFactoryBean getObject called before setInterceptorNames, silently creating an invalid proxy [SPR-7582] #27817
    • Possible NPE in Spring MVC LogFormatUtils #27783
    • UndertowHeadersAdapter's remove() method violates Map contract #27593
    • Fix assertion failure messages in DefaultDataBuffer.checkIndex() #27577

    :notebook_with_decorative_cover: Documentation

    • Lazy annotation throws exception if non-required bean does not exist #27660
    • Incorrect Javadoc in [NamedParameter]JdbcOperations.queryForObject methods regarding exceptions #27581
    • DefaultResponseErrorHandler update javadoc comment #27571

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR25 #27635
    • Upgrade to Log4j2 2.16.0 #27825

    v5.2.18.RELEASE

    :star: New Features

    • Enhance DefaultResponseErrorHandler to allow logging complete error response body #27558
    • DefaultMessageListenerContainer does not log an error/warning when consumer tasks have been rejected #27457

    :lady_beetle: Bug Fixes

    • Performance impact of con.getContentLengthLong() in AbstractFileResolvingResource.isReadable() downloading huge jars to check component length #27549
    • Performance impact of ResourceUrlEncodingFilter on HttpServletResponse#encodeURL #27548
    • Avoid duplicate JCacheOperationSource bean registration in #27547
    • Non-escaped closing curly brace in RegEx results in initialization error on Android #27502
    • Proxy generation with Java 17 fails with "Cannot invoke "Object.getClass()" because "cause" is null" #27498
    • ConcurrentReferenceHashMap's entrySet violates the Map contract #27455

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR24 #27526

    v5.2.17.RELEASE

    ... (truncated)

    Commits
    • cfa701b Release v5.2.20.RELEASE
    • 996f701 Refine PropertyDescriptor filtering
    • 90cfde9 Improve diagnostics in SpEL for large array creation
    • 94f52bc Upgrade to Artifactory Resource 0.0.17
    • d4478ba Upgrade Java versions in CI image
    • 136e6db Upgrade Ubuntu version in CI images
    • 8f1f683 Upgrade Java versions in CI image
    • ce2367a Upgrade to Log4j2 2.17.1
    • acf7823 Next development version (v5.2.20.BUILD-SNAPSHOT)
    • 1a03ffe Upgrade to Log4j2 2.16.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump spring-web from 4.3.18.RELEASE to 6.0.0

    Bump spring-web from 4.3.18.RELEASE to 6.0.0

    Bumps spring-web from 4.3.18.RELEASE to 6.0.0.

    Release notes

    Sourced from spring-web's releases.

    v6.0.0

    See What's New in Spring Framework 6.x and Upgrading to Spring Framework 6.x for upgrade instructions and details of new features.

    :star: New Features

    • Avoid direct URL construction and URL equality checks #29486
    • Simplify creating RFC 7807 responses from functional endpoints #29462
    • Allow test classes to provide runtime hints via declarative mechanisms #29455

    :notebook_with_decorative_cover: Documentation

    • Align javadoc of DefaultParameterNameDiscoverer with its behavior #29494
    • Document AOT support in the TestContext framework #29482
    • Document Ahead of Time processing in the reference guide #29350

    :hammer: Dependency Upgrades

    • Upgrade to Reactor 2022.0.0 #29465

    :heart: Contributors

    Thank you to all the contributors who worked on this release:

    @​ophiuhus and @​wilkinsona

    v6.0.0-RC4

    :star: New Features

    • Introduce DataFieldMaxValueIncrementer for SQL Server sequences #29447
    • Introduce findAllAnnotationsOnBean variant on ListableBeanFactory #29446
    • Introduce support for Jakarta WebSocket 2.1 #29436
    • Allow @ControllerAdvice in WebFlux to handle exceptions before a handler is selected #22991

    :lady_beetle: Bug Fixes

    • Bean with unresolved generics do not use fallback algorithms with AOT #29454
    • TomcatRequestUpgradeStrategy is not compatible with Tomcat 10.1 #29434
    • Autowiring of a generic type produced by a factory bean fails after AOT processing #29385

    :notebook_with_decorative_cover: Documentation

    • Reference PDF containing full docs not available #28451

    :hammer: Dependency Upgrades

    • Revisit Servlet API baseline: Servlet 6.0 in the build, Servlet 5.0 compatibility at runtime #29435
    • Upgrade to Context Propagation 1.0.0 #29442
    • Upgrade to Jackson 2.14.0 #29351
    • Upgrade to Micrometer 1.10.0 #29441

    ... (truncated)

    Commits
    • 5a30a43 Release v6.0.0
    • 42856ba Add milestone repo for optional Netty 5 support
    • 9be6cea Polishing deprecated methods
    • 37b4391 Align javadoc of DefaultParameterNameDiscoverer with its behavior
    • 09a58a5 Polish
    • 10f4ad1 Assert fixed in DefaultErrorResponseBuilder
    • 9457ed3 Document AOT support in the TestContext framework
    • 074ec97 Fix section formatting in the testing chapter
    • 9ede4af Revert "Ignore HttpComponents Javadoc"
    • bfc1251 Merge branch '5.3.x'
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump jackson-databind from 2.10.0 to 2.12.7.1

    Bump jackson-databind from 2.10.0 to 2.12.7.1

    Bumps jackson-databind from 2.10.0 to 2.12.7.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Preprocessing appears to run but produces no output

    Preprocessing appears to run but produces no output

    Dear OCr4all team, I have loaded 5 image files and started preprocessing. OCR4all seems to process the first page but not the other pages. In the next step, even the first page is missing. Does it actually perform the preprocessing? Thanks for any help!

    Bildschirmfoto 2022-06-07 um 23 51 41

    Edit: The console error tab does not output any error message.

    opened by ESLincke 1
  • Kraken for OCR purpose

    Kraken for OCR purpose

    Hi! I would like to use Kraken instead of calamari for the OCR part, not only for segmentation. How can I do it? Just pass the directory with the models when running the docker image? If I want to use a custom model, developed and trained in python, what does it need to return? Will I use it in the same way in which I use Kraken?

    Type: Question 
    opened by aliceinland 1
  • Bump spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE

    Bump spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE

    Bumps spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE.

    Release notes

    Sourced from spring-core's releases.

    v5.2.22.RELEASE

    :star: New Features

    • Refine CachedIntrospectionResults property introspection #28446

    :lady_beetle: Bug Fixes

    • Ignore invalid STOMP frame #28444

    v5.2.21.RELEASE

    :star: New Features

    • Remove DNS lookups during websocket connection initiation #28281

    :lady_beetle: Bug Fixes

    • Improve documentation and matching algorithm in data binders #28334
    • CodeGenerationException thrown when using AnnotationMBeanExporter on JDK 17 #28279
    • ResponseEntity objects are accumulated in ConcurrentReferenceHashMap #28273
    • NotWritablePropertyException when attempting to declaratively configure ClassLoader properties #28272

    v5.2.20.RELEASE

    :star: New Features

    • Restrict access to property paths on Class references #28262
    • Improve diagnostics in SpEL for large array creation #28257

    v5.2.19.RELEASE

    :star: New Features

    • Declare serialVersionUID on DefaultAopProxyFactory #27785
    • Use ByteArrayDecoder in DefaultClientResponse::createException #27667

    :lady_beetle: Bug Fixes

    • ProxyFactoryBean getObject called before setInterceptorNames, silently creating an invalid proxy [SPR-7582] #27817
    • Possible NPE in Spring MVC LogFormatUtils #27783
    • UndertowHeadersAdapter's remove() method violates Map contract #27593
    • Fix assertion failure messages in DefaultDataBuffer.checkIndex() #27577

    :notebook_with_decorative_cover: Documentation

    • Lazy annotation throws exception if non-required bean does not exist #27660
    • Incorrect Javadoc in [NamedParameter]JdbcOperations.queryForObject methods regarding exceptions #27581
    • DefaultResponseErrorHandler update javadoc comment #27571

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR25 #27635
    • Upgrade to Log4j2 2.16.0 #27825

    ... (truncated)

    Commits
    • 8f4c172 Release v5.2.22.RELEASE
    • 9f238c9 Polishing
    • 50177b1 Refine CachedIntrospectionResults property introspection
    • 159a99b Ignore invalid STOMP frame
    • 41e158c Next development version (v5.2.22.BUILD-SNAPSHOT)
    • 833e750 Improve documentation and matching algorithm in data binders
    • d70054d Upgrade to Log4j2 2.17.2
    • 36e4951 Polishing
    • 87b5080 Consistent use of getLocalAddr() without DNS lookups in request adapters
    • 5cbf85a Avoid return value reference in potentially cached MethodParameter instance
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(0.6.1)
  • 0.6.1(Jan 28, 2022)

    Features

    • adds additional info messages regarding the newly added deep3 models to the UI
    • removes some obsolete information from the UI

    Bugfixes

    • fixes some minor UI bugs
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Jan 26, 2022)

    Features

    • adds Kraken for baseline based layout analysis (regions and lines)
    • adds support for Calamari >v2.x
    • includes new and improved models for Calamari
    • upgrades LAREX interface to support latest LAREX releases with many new features
    • adds compatibility with latest ocr4all-helper-scripts version with e.g. improved line segmentation
    • adds DOCX result generation
    • adds support for custom page delimiters for result generation

    Bugfixes

    • removes outdated / duplicated parameters for some processing steps

    Other

    • removes legacy code used for handling GTC_Web
    • includes update to the latest version of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC3(Jan 24, 2022)

    Features

    • Change settings activated by default for recognition

    Bugfixes

    • Hide unused dummy / kraken segmentation settings
    • Sort imagePathList to ensure same image variant order in UI
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC2(Dec 23, 2021)

    Features

    • Adds selecting between Kraken and Dummy segmentation in Process Flow

    Bugfixes

    • fixes available calamari-predict parameters
    • updates backend pinging to avoid session time outs
    • removes unused files and debug code
    • updates datatables
    • fixes several typos and updates description texts
    • hides unused settings in Process Flow
    • fixes console output to front end during training
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC1(Nov 23, 2021)

  • 0.5.0(Nov 7, 2020)

    Features

    • Added extensive REST API to control OCR4all workflow without the GUI
    • Added the ability to choose certain Text Result Generation strategies (GT only, Recognition only, Combined)
    • Added the ability to keep or remove empty text lines in the Text Result Generation
    • PAGE XML schema version 2019-07-15 added as default schema version

    Bugfixes

    • Fixed a bug which caused certain TIFF images to get converted incorrectly
    • Projects can now be loaded even when no processing directory exists yet
    • Result Generation now only exports the selected PAGE XML files
    • Fixes bug which crashed on Line Segmentation when the underlying PAGE XML contained TextRegion elements without a specific subtype
    • Fixes version number of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC3(Nov 6, 2020)

    Bugfixes

    • temporarily make process state collector less lenient again as the implemented regex patterns caused time outs on very large projects
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC2(Oct 21, 2020)

    Bugfixes

    • Fixes bug which crashed on Line Segmentation when the underlying PAGE XML contained TextRegion elements without a specific subtype
    • Fixes version number of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC1(Oct 16, 2020)

    Features

    • Added extensive REST API to control OCR4all workflow without the GUI
    • Added the ability to choose certain Text Result Generation strategies (GT only, Recognition only, Combined)
    • Added the ability to keep or remove empty text lines in the Text Result Generation
    • Process state collector is now more lenient and works better with externally created PAGE XML files
    • PAGE XML schema version 2019-07-15 added as default schema version

    Bugfixes

    • Fixed a bug which caused certain TIFF images to get converted incorrectly
    • Projects can now be loaded even when no processing directory exists yet
    • Result Generation now only exports the selected PAGE XML files
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Jul 29, 2020)

    Features

    • Added filter to image list in sidebar to either (de)select all pages or only even/odd pages to ease working with e.g. bilingual editions

    Bugfixes

    • Result generation for text files no longer crashes on lines which contain neither recognized text nor ground truth text
    • Result generation for text files now respects the reading order of PAGE XML files (if a reading order exists)

    Other

    • prima-core-libs added to ease working with PAGE XML files
    • removed some obsolete files
    • renamed Artifact from OCR4all_Web to ocr4all
    Source code(tar.gz)
    Source code(zip)
  • 0.4-RC1(Jul 8, 2020)

    • Adds filter to image list in sidebar to either (de)select all pages or only even/odd pages to ease working with e.g. bilingual editions
    • Result generation for text files no longer crashes on lines which contain neither recognized text or ground truth text
    • Result generation for text files now respects the reading order of PAGE XML files (if a reading order exists)
    • prima-core-libs added to ease working with PAGE XML files
    • remove some obsolete files
    • rename Artifact from OCR4all_Web to ocr4all
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(May 27, 2020)

    Features

    • Upgrade to Calamari version 1.0.5 with TensorFlow 2 backend
    • Adds automatic project conversion from legacy to latest (entirely PAGE XML based and considerably less memory intensive)
    • Adds Ground Truth export
    • Adds checkbox for word level PAGE XML generation during Recognition

    Bugixes

    • Greyscale and despeckled images are now correctly sent to LAREX
    • Removed obsolete image type selection for all workflow steps related to Larex/GTP
    • Reduced unnecessarily verbose TF2 logging during Training and Recognition
    • Fixed the Despeckling workflow step so that despeckled images will get saved again
    • Various changes to the UI on the textual level (typos and improvements)
    • Large PDF files should now be streamed page per page to avoid memory issues
    Source code(tar.gz)
    Source code(zip)
  • 0.3-RC3(May 27, 2020)

  • 0.3-RC2(May 19, 2020)

    Second release candidate for OCR4all v0.3.

    • Adds possibility to convert legacy to latest projects in the UI
    • Adds checkbox for word level Page XML generation during Recognition
    • Removes obsolete image type selection for all workflow steps related to Larex/GTP
    • Reduces unnecessarily verbose TF2 logging during Training and Recognition
    • Fixes the Despeckling workflow step so that despeckled images will get saved again
    • Various changes to the UI on the textual level (typos and improvements)
    Source code(tar.gz)
    Source code(zip)
Owner
An Open Source Tool Providing a Comprehensive But Easy to Use (Semi-)Automatic OCR Workflow for Historical Printings
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Jan 05, 2023
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
Driver Drowsiness Detection with OpenCV & Dlib

In this project, we have built a driver drowsiness detection system that will detect if the eyes of the driver are close for too long and infer if the driver is sleepy or inactive.

Mansi Mishra 4 Oct 26, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 06, 2022
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
The papers published in top-tier AI conferences in recent years.

AI-conference-papers The papers published in top-tier AI conferences in recent years. Paper table AAAI ICLR CVPR ICML ICCV ECCV NIPS 2019 ✔️ ✔️ ✔️ ✔️

Jinbae Park 6 Dec 09, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 03, 2023
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 03, 2022
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 05, 2023
list all open dataset about ocr.

ocr-open-dataset list all open dataset about ocr. printed dataset year Born-Digital Images (Web and Email) 2011-2015 COCO-Text 2017 Text Extraction fr

hongbomin 95 Nov 24, 2022