Line based ATR Engine based on OCRopy

Last update: Dec 23, 2022

Related tags

Overview

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated and customized from other python scripts.

Pretrained model repository

Pretrained models are available at (https://github.com/Calamari-OCR/calamari_models). The current release can be accessed here (336 MB).

Installing

Installation using Pip

The suggested method is to install calamari into a virtual environment using pip:

virtualenv -p python3 PATH_TO_VENV_DIR (e. g. virtualenv calamari_venv)
source PATH_TO_VENV_DIR/bin/activate
pip install calamari_ocr

which will install calamari and all of its dependencies.

To install the package without a virtual environment simply run

pip install calamari_ocr

To install the package from its source, download the source code and run

python setup.py install

Installation using Conda

Run

conda env create -f environment_master_gpu.yml

Alternatively you can install the cpu versions or the current dev version instead of the stable master.

Command line interface (Standard User)

If you simply want to use calamari for applying existent models to your text lines and optionally train new models you probably should use the command line interface of calamari, which is very similar to the one of OCRopy.

Note that you have to activate the virtual environment if used during the installation in order to make the command line scripts available.

Prediction of a page

Currently only OCR on lines is supported. To segment pages into lines (and the preceding preprocessing steps) we refer to the solutions provided by OCRopus, Kraken, Tesseract, etc. For users (especially less technical ones) in need of an all-in-one package OCR4all might be worth a look.

The prediction step using very deep neural networks implemented on Tensorflow as core feature of calamari should be used:

calamari-predict --checkpoint path_to_model.ckpt --files your_images.*.png

Calamari also supports several voting algorithms to improve different predictions of different models. To enable voting you simply have to pass several models to the --checkpoint argument:

calamari-predict --checkpoint path_to_model_1.ckpt path_to_model_2.ckpt ... --files your_images.*.png

The voting algorithm can be changed by the --voter flag. Possible values are: confidence_voter_default_ctc (default), sequence_voter. Note that both confidence voters depend on the loss function used for training a model, while the sequence voter can be used for all models but might yield slightly worse results.

Training of a model

In calamari you can both train a single model using a given data set or train a fold of several (default 5) models to generate different voters for a voted prediction.

Training a single model

A single model can be trained by the calamar-train-script. Given a data set with its ground truth you can train the default model by calling:

calamari-train --files your_images.*.png

Note, that calamari expects that each image file (.png) has a corresponding ground truth text file (.gt.txt) at the same location with the same base name.

There are several important parameters to adjust the training. For a full list type calamari-train --help.

--network=cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5: Specify the network structure in a simple language. The default network consists of a stack of two CNN- and Pooling-Layers, respectively and a following LSTM layer. The network uses the default CTC-Loss implemented in Tensorflow for training and a dropout-rate of 0.5. The creation string thereto is: cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5. To add additional layers or remove a single layer just add or remove it in the comma separated list. Note that the order is important!
--line_height=48: The height of each rescaled input file passed to the network.
--num_threads=1: The number of threads used during training and line preprocessing.
--batch_size=1: The number of lines processed in parallel.
--display=1: (epochs) How often an informative string about the current training process is printed in the shell
--output_dir: A path where to store checkpoints
--checkpoint_frequency: (epochs) How often a model shall be written as checkpoint to the drive
--epochs: The maximum number of training iterations (batches) for training. Note: this is the upper boundary if you use early stopping.
--samples_per_epoch: The number of samples to process per epoch (by default the size of the dataset)
--validation=None: Provide a second data set (images with corresponding .gt.txt) to enable early stopping.
--early_stopping_frequency=checkpoint_frequency: How often to check for early stopping on the validation dataset.
--early_stopping_nbest=10: How many successive models must be worse than the current best model to break the training loop
--early_stopping_best_model_output_dir=output_dir: Output dir for the current best model
--early_stopping_best_model_prefix=best: Prefix for the best model (output name will be {prefix}.ckpt
--n_augmentations=0: Data augmentation on the training set.
--weights: Load network weights from a given pretrained model. Note that the codec will probabily change its size to match the codec of the provided ground truth files. To enforce that some characters may not be deleted use a --whitelist.
--whitelist=[] --whitelist_files=[]: Specify either individual characters or a text file listing all white list characters stored as string.

Hint: If you want to use early stopping but don't have a separated validation set you can train a single fold of the calamari-cross-fold-train-script (see next section).

Training a n-fold of models

To train n more-or-less individual models given a training set you can use the calamari-cross-fold-train-script. The default call is

calamari-cross-fold-train --files your_images*.*.png --best_models_dir some_dir

By default this will train 5 default models using 80%=(n-1)/n of the provided data for training and 20%=1/n for validation. These independent models can then be used to predict lines using a voting mechanism. There are several important parameters to adjust the training. For a full list type calamari-cross-fold-train --help.

Almost parameters of calamari-train can be used to affect the training
--n_folds=5: The number of folds
--weights=None: Specify one or n_folds models to use for pretraining.
--best_models_dir=REQUIRED: Directory where to store the best model determined on the validation data set
--best_model_label={id}: The prefix for each of the best model of each fold. A string that will be formatted. {id} will be replaced by the number of the fold, i. e. 0, ..., n-1.
--temporary_dir=None: A directory where to store temporary files, e. g. checkpoints of the scripts to train an individual model. By default a temporary dir using pythons tempfile modules is used.
--max_parallel_models=n_folds: The number of models that shall be run in parallel. By default all models are trained in parallel.
--single_fold=[]: Use this parameter to train only a subset, e. g. a single fold out of all n_folds.

To use all models to predict and then vote for a set of lines you can use the calamari-predict script and provide all models as checkpoint:

calamari-predict --checkpoint best_models_dir/*.ckpt.json --files your_images.*.png

Evaluating a model

To compute the performance of a model you need first to predict your evaluation data set (see calamari-predict. Afterwards run

calamari-eval --gt *.gt.txt

on the ground truth files to compute an evaluation measure including the full confusion matrix. By default the predicted sentences as produced by the calamari-predict script end in .pred.txt. You can change the default behavior of the validation script by the following parameters

--gt=REQUIRED: The ground truth txt files.
--pred=None: The prediction files. If None it is expected that the prediction files have the same base name as the ground truth files but with --pred_ext as suffix.
--pred_ext=.pred.txt: The suffix of the prediction files if --pred is not specified
--n_confusions=-1: Print only the top n_confusions most common errors.

Experimenting with different network hyperparameters (experimental)

To find a good set of hyperparameters (e. g. network structure, learning rate, batch size, ...) you can use the experiment.pyscript that will both train models using the Cross-Fold-Algorithm and evaluate the model on a given evaluation data set. Thereto this script will directly output the performance of each individual fold, the average and its standard deviation, plus the results using the different voting algorithms. If you want to use this experimental script have a look at the parameters (experiment.py --help).

Comments

after traning how to create models?

I run train command and it saved checkpoints in output directory. now , how can i use these as model? how to make models? which command? is there any more detailed documentation for this?

opened by UlasSAYGINIM 27
Allow namespace prefixes other than 'None' in PageXML
Eynollah, e.g., produces PageXML files that use an explicit prefix (xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15").

calamari_ocr/ocr/dataset/datareader/pagexml/reader.py, however, expects the prefix to be 'None' and throws an error when processing an eynollah pagexml.

When I change line 120 of reader.py from

ns = {"ns": root.nsmap[None]}

to

ns = {'ns' : 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15'}

it works. I'm not sure if you can generalize the namespace dictionary to cover both output styles. Maybe xpath's local-name function (instead of lxml find or findall) is an alternative.
opened by alexander-winkler 17

performance degradation for versions > 0.2.5

It seems that a performance issue was introduced between 0.2.5 and 0.3.0 releases. I tested separately on environments with tensorflow cpu and gpu. Tensorflow version: 1.13.1

Hardware: GPU: NVIDIA Tesla M60 CPU: intel i7 4710hq (8 threads)

I've got images already in memory, so I use RawDataSet. Then I wrap it with InputDataset. And finally I use Predictor directly in code. The code is here: https://gist.github.com/wosiu/9fa50de9e47615b5fa08b23637e1f947

| version | GPU time | CPU time | | --- | --- | --- | | 0.2.5 | 1440 ms | 2100 ms | | 0.3.0 | not tested | 5700ms | | 0.3.1 | 5859 ms | 6000ms |

And some logs I get, not sure if related:

tensorflow-gpu, calamari 0.3.1:

2019-05-16 15:41:16,329 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 10 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
2019-05-16 15:41:18,694 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 13 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
2019-05-16 15:41:20,461 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 2 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
2019-05-16 15:41:22,187 INFO 21006 140466380875520 metrics2.py:126 ocr_ms took 5859 ms

tensorflow cpu, calamari 0.3.1:

2019-05-16 15:50:48,571 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 10 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24599 tid 24599 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24582 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24583 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24605 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24608 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24585 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24610 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24611 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24609 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24606 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24613 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24607 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24612 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24584 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24615 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24621 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24620 thread 2 bound to OS proc set 0-7
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24614 tid 24614 thread 0 bound to OS proc set 0
2019-05-16 15:50:50.630679: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.630721: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.677730: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.677766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.717138: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.717175: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.753602: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.753637: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.775119: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.775151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.793670: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.793716: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.827669: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.827717: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.865883: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.865927: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.912646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.912676: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.955744: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50.955777: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:50,960 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 13 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24643 tid 24643 thread 0 bound to OS proc set 0
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24655 tid 24655 thread 0 bound to OS proc set 0
2019-05-16 15:50:52.560072: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.560110: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.599294: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.599324: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.621620: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.621653: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.641955: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.641989: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.663661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.663695: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.684411: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.684477: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.704166: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.704195: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.720328: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.720356: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.735363: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.735405: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.754581: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.754612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.764184: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.764352: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.780335: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52.780366: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:52,783 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 2 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24679 tid 24679 thread 0 bound to OS proc set 0
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 24684 tid 24684 thread 0 bound to OS proc set 0
2019-05-16 15:50:54.533429: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:54.533478: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:54.565486: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:54.565553: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:50:54,569 INFO 23732 140378726676224 metrics2.py:126 ocr_ms took 5999 ms

tensorflow cpu, calamari 0.3.0:

2019-05-16 15:57:36,407 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 10 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25755 tid 25755 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25738 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25740 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25760 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25741 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25761 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25762 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25766 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25765 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25767 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25768 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25770 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25763 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25739 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25772 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25778 thread 3 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25777 thread 2 bound to OS proc set 0-7
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25771 tid 25771 thread 0 bound to OS proc set 0
2019-05-16 15:57:38.043087: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.043131: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.105437: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.105465: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.158275: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.158300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.213394: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.213423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.234949: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.235122: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.263308: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.263334: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.310647: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.310687: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.387153: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.387188: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.428633: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.428661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.477382: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38.477543: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:38,481 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 13 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25808 tid 25808 thread 0 bound to OS proc set 0
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25814 tid 25814 thread 0 bound to OS proc set 0
2019-05-16 15:57:40.048510: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.048545: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.091460: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.091488: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.121078: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.121109: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.151877: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.152048: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.173696: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.173745: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.197971: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.198044: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.219307: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.219341: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.238584: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.238613: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.263360: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.263388: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.288734: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.288763: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.308051: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.308091: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.329646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.329827: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.342140: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40.342170: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:40,343 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 2 files in the dataset
WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25839 tid 25839 thread 0 bound to OS proc set 0
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 25847 tid 25847 thread 0 bound to OS proc set 0
2019-05-16 15:57:42.054612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:42.054845: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:42.100764: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:42.100795: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 15:57:42,104 INFO 24999 140185064949504 metrics2.py:126 ocr_ms took 5698 ms

tensorflow-gpu, calamari 0.2.5: no warnings or errors

2019-05-16 16:05:10,802 INFO 30495 139657482069760 metrics2.py:126 ocr_ms took 1440 ms

tensorflow cpu, calamari 0.2.5:

2019-05-16 16:00:12,076 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 10 files in the dataset
OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26569 thread 0 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26590 thread 1 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26591 thread 2 bound to OS proc set 0-7
OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26592 thread 3 bound to OS proc set 0-7
2019-05-16 16:00:12.699564: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.699610: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.739422: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.739451: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.774491: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.774525: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.808151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.808183: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.830500: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.830530: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.863712: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.863750: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.939775: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.939804: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.980605: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:12.980636: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.015485: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.015515: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.059779: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.059824: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13,064 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 13 files in the dataset
2019-05-16 16:00:13.477228: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.477268: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.506554: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.506587: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.521381: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.521414: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.534570: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.534615: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.548266: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.548300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.572625: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.572665: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.595008: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.595052: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.616656: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.616823: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.647358: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.647398: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.667300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.667423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.680838: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.680964: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.696618: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.696664: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.716621: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13.716766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:13,718 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 2 files in the dataset
2019-05-16 16:00:14.027123: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:14.027158: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:14.096210: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:14.096237: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
2019-05-16 16:00:14,099 INFO 26062 140471660128000 metrics2.py:126 ocr_ms took 2023 ms

opened by wosiu 14

Issue on CTC loss when training on new data

HI,

When training Calamari on my dataset, I got this error tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.

Can you help me? Thank you

opened by realjoenguyen 14

Prediction API Error

I used cli to train on SROIE2019 dataset (original images are preprocessed into line images) with :

calamari-train \
--device.gpus 0 \
--trainer.gen SplitTrain \
--trainer.gen.validation_split_ratio=0.2  \
--trainer.output_dir /data/model_output \
--trainer.epochs 25 \
--early_stopping.frequency=1 \
--early_stopping.n_to_go=3 \
--train.images /data/*.jpg

Training went smooth and the logs are train.log

After the training process, I am trying to load the model as mentioned here, however I get following error:

>>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/best.ckpt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 31, in from_checkpoint
    keras.models.load_model(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

I tried loading pretrainined model from antiqua_historical, and again I got the same error:

>>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/antiqua_historical/0.ckpt')
/usr/local/lib/python3.8/dist-packages/paiargparse/dataclass_json_overrides.py:78: RuntimeWarning: `NoneType` object value of non-optional type tfaip_commit_hash detected when decoding CalamariScenarioParams.
  warnings.warn(f"`NoneType` object {warning}.", RuntimeWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 26, in from_checkpoint
    ckpt = SavedCalamariModel(checkpoint, auto_update=auto_update_checkpoints)
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 31, in __init__
    self.update_checkpoint()
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 56, in update_checkpoint
    self._single_upgrade()
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 88, in _single_upgrade
    update_model(self.dict, self.ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/migrations/version3_4to5.py", line 22, in update_model
    pred_model.load_weights(path + ".h5")
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2234, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 662, in load_weights_from_hdf5_group
    original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'

opened by Mageswaran1989 12

Prediction step using very deep neural networks feature of calamari

Hi, I installed calamari-0.2.4 . Tried to test on this simple example ""https://user-images.githubusercontent.com/33478216/46499779-a909b480-c829-11e8-87f2-d4a34d84ab69.png"" by: calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png

It returns this Error :+1: Found 1 files in the dataset Traceback (most recent call last): File "/home/pc/my_calamari_env/bin/calamari-predict", line 11, in load_entry_point('calamari-ocr==0.2.4', 'console_scripts', 'calamari-predict')() File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 151, in main run(args) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 61, in run predictor = MultiPredictor(checkpoints=args.checkpoint, batch_size=args.batch_size, processes=args.processes) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in init self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 100, in init ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/checkpoint.py", line 20, in init self.json = json.load(f) File "/usr/lib/python3.5/json/init.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.5/json/init.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

Thanks for your help :)

opened by Tailor2019 12
Applying data processor Text Normalizer

@ChWick I have updated calamari-ocr version to 2.0.0 and now training takes ages to start. Previously, calamari used to compute codec and start. Now, calamari takes 2+ days to apply text normalization. I cant afford to wait 3 days for training to start. Can someone help?

opened by abhikatoldtrafford 11
Error: Process finished with code 1 in cross-fold

It worked with the new code BUT after the fold 0 is done and found no better model than the 99,056858 I again get an error

FOLD 0 | Storing checkpoint to 'I:\BIQE\CALAMARI\projects\voetius\TRAINING\crosstrainen\fold_0\model_00019470.ckpt' FOLD 0 | Checking early stopping model Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1948/1948 [04:06<00:00, 6.38it/s] FOLD 0 | No better model found. Currently accuracy of 99.056858% at iter 11682 (remaining nbest = 0) FOLD 0 | Early stopping now. FOLD 0 | Total time 11274.687343358994s for 19469 iterations. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(*args)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 27, in train_individual_model ], args.get("run", None), {"threads": args.get('num_threads', -1)}), verbose=args.get("verbose", False)): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\utils\multiprocessing.py", line 87, in run raise Exception("Error: Process finished with code {}".format(process.returncode)) Exception: Error: Process finished with code 1 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\drsjh\Anaconda3\envs\calamaridev\Scripts\calamari-cross-fold-train.exe_main.py", line 9, in File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\scripts\cross_fold_train.py", line 80, in main temporary_dir=args.temporary_dir, keep_temporary_files=args.keep_temporary_files, File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 151, in run pool.map_async(train_individual_model, run_args).get() File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 644, in get raise self._value Exception: Error: Process finished with code 1

opened by cornerman57 11

Use pre-trained Calamari models

Thanks for the great work!

I installed Calamari on a new AWS P2 instance and calamari-models. Tried to test on a simple example by

calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png

The detected text is way off. I guess it is related to the loading of model.

I got these warnings:

Found 1 files in the dataset
2018-08-05 17:12:16.976735: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Attempting a workaround: New graph and load weights
Using CUDNN compatible LSTM backend on CPU
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:417: calling reverse_sequence (from tensorflow.python.ops.array_ops) with seq_dim is deprecated and will be removed in a future version.
Instructions for updating:
seq_dim is deprecated, use seq_axis instead
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:432: calling reverse_sequence (from tensorflow.python.ops.array_ops) with batch_dim is deprecated and will be removed in a future version.
Instructions for updating:
batch_dim is deprecated, use batch_axis instead
2018-08-05 17:12:20.637472: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Minimum/ExponentialMovingAverage not found in checkpoint
Attempting workaround: only loading trainable variables
Loading Dataset: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 109.32it/s]
Data Preprocessing: 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 104.47it/s]
Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.74it/s]
Prediction of 1 models took 0.14934062957763672s

Is it due to the tensorflow version that ExponentialMovingAverage are not loaded? Currently installing calamari will install tensowflow 1.9. What tf version do you use in your development?

Thanks!

opened by zhangxiangnick 10

TypeError: metaclass conflict (problem with tfaip?)

Hello!

I'm not sure if I'm missing something, but as there has already been a problem with tfaip (#205), I wanted to point out an issue I'm struggling with when installing the latest version of calamari.

Here my output. Any hints welcome, the hack provided in the above-mentioned issue does not work.

 [email protected]:~/virtualenvs/calamari_2-1-1/calamari(master)$ calamari-train --version
Traceback (most recent call last):
  File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 33, in <module>
    sys.exit(load_entry_point('calamari-ocr==2.1.1', 'console_scripts', 'calamari-train')())
  File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/importlib_metadata-4.0.1-py3.6.egg/importlib_metadata/__init__.py", line 166, in load
    module = import_module(match.group('module'))
  File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/calamari_ocr-2.1.1-py3.6.egg/calamari_ocr/scripts/train.py", line 5, in <module>
    from tfaip.util.logging import logger, WriteToLogFile
  File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/__init__.py", line 37, in <module>
    from tfaip.scenario.scenariobaseparams import ScenarioBaseParams
  File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/scenario/scenariobaseparams.py", line 48, in <module>
    class ScenarioBaseParams(Generic[TDataParams, TModelParams], ABC, metaclass=ScenarioBaseParamsMeta):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

opened by alexander-winkler 9

training: shuffle data between epochs

First of all - thank you for that fantastic framework! I've been using tesseract for more than 1 year, but this one is way better for a single line processing :)

Proposal: From the logs during training, it seems that input images are not shuffled at all. It would be nice, if they are shuffled at least at the very beginning. And it would be perfect if data are also shuffled after each epoch, so that different batches are created.

opened by wosiu 9
Add CodeQL workflow for GitHub code scanning
Hi Calamari-OCR/calamari!

This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

Questions? Check out the FAQ below!

FAQ

Click here to expand the FAQ section

How often will the code scanning analysis run?

By default, code scanning will trigger a scan with the CodeQL engine on the following events:

On every pull request — to flag up potential security problems for you to investigate before merging a PR.

On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.

Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

What will this cost?

Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

What types of problems does CodeQL find?

The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

How do I upgrade my CodeQL engine?

No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

The analysis doesn’t seem to be working

If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

How do I disable LGTM.com?

If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

Which source code hosting platforms does code scanning support?

GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

How do I know this PR is legitimate?

This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

I have another question / how do I get in touch?

Please join the discussion here to ask further questions and send us suggestions!
opened by lgtm-com[bot] 0
calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software

Hi, I installed calamari-ocr-2.2.2 on ubuntu 22.04, and tensorflow-2.6, and python-3.9 in a venv. had to remove keras-2.11 which came with tensorflow2.6, and replace with keras 2.6.0 to get rid of error. Works great with cpu. So far so good.

With tensorflow 2.6, it seems I am forced into a narrow range of cuda-11.2 and nvidia 360 drivers. I have not been able to get either successfully installed. Anyone have any success stories with Nvidia GPU and ubuntu 22.04 and calamari 2.2.2? Thanks!

opened by ocrwork 0

calamari-eval: unknown arguments

I am on Calamari 2.2.2, and when freely combining the arguments I see on --help …

calamari-eval --checkpoint hsbfraktur.cala/best.ckpt.json --gt.preload false --n_worst_lines 10   --gt.texts /dev/shm/hsbfraktur.val/*.gt.txt --evaluator.progress_bar false

…I end up with the following cryptic error message…

             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "/home/h1/rosa992c/my-kernel/powerai-kernel2/bin/calamari-eval", line 8, in <module>
    sys.exit(run())
  File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 200, in run
    main(parse_args())
  File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 206, in parse_args
    return parser.parse_args(args=args).root
  File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/paiargparse/main_parser.py", line 93, in parse_args
    raise UnknownArgumentError(f"Unknown Arguments {' '.join(argv)}. Possible alternatives:{''.join(help_str)}")
paiargparse.dataclass_parser.UnknownArgumentError: Unknown Arguments  . Possible alternatives:

opened by bertsky 6

featreq: when warmstart-training, init weights of new chars from existing ones

I have the following feature request: Often one needs to finetune a model to add diacritics. Luckily, we can finetune with --warmstart ... --codec.keep_loaded False. In such cases the actual witnesses of the diacritics are usually still sparse in the GT. So it would likely be helpful if the weights of the additional characters / codepoints could be initialized from those of characters that are similar looking or similar in function. Perhaps as an option --codec.init_new_from_old '["à": "a", "ś": "s" ...]' ...
enhancement

opened by bertsky 2
HDF5 dataset format: how to convert

I presume training on HDF5 will be more efficient than any of the other formats. And at least against the line GT file pairs, filesystem performance might be much better, too.

So my question is: how do I convert existing datasets into HDF5 format?

opened by bertsky 4

Releases(v2.2.2)

v2.2.2(Mar 21, 2022)
Remove alpha channel from LA images

Source code(tar.gz)
Source code(zip)
v2.2.1(Mar 21, 2022)
Fix calling calamari-predict --help without --checkpoints

Don't divide by zero if there are no predictions

Source code(tar.gz)
Source code(zip)
v2.1.5(Mar 21, 2022)
Added predifined network architectures. Support to specify them via parameter

Tensorboard display fixed for retraining on original

Added a no_train flag to cross_fold_train.py to only create folds without training

Source code(tar.gz)
Source code(zip)
v2.2.0(Mar 21, 2022)
Upgrade to tfaip 1.2.6

PageXML: Emit Glyphs, Words and confidence

Source code(tar.gz)
Source code(zip)
v2.1.4(Oct 2, 2021)
Support to run cross-fold-train on distributes systems (slurm)

Added parameter for maximum line length

PageXML can now emit glyphs, words, and their confidences

Fixed support for lists files (files that comprise a list of filenames)

Source code(tar.gz)
Source code(zip)
v2.1.3(Sep 8, 2021)
Extended and updated docs

Support for rule files for character regularization

Fix of auto rotation of PageXML

Upgrade to tfaip 1.2.5

Changed default parameters: enabling EMA weights of 0.99, weight decay of 1e-5

support for parallel post-processing (and disabling it)

Source code(tar.gz)
Source code(zip)
v2.1.2(May 30, 2021)
Fixed definition of custom network architectures

Fixed prediction positions

Source code(tar.gz)
Source code(zip)
v2.1.1(May 11, 2021)
fixed cmd line interface

fixed migration of old models using BIDI

fixed pagexml file sorting

Source code(tar.gz)
Source code(zip)
v2.1.0(May 8, 2021)
Completely changed the command line interface (see README and docs and tests)

tests as runnable python unittests, see calamari_ocr/test

Update of tfaip version

Updated documentation/README

Source code(tar.gz)
Source code(zip)
v2.0.2(Mar 20, 2021)
Enforcing usage of a validation dataset

Update text regularizer

Fixed random blobs augmentation

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 13, 2021)
Support for custom data generators

Added support for --validation_split_ratio splitting the provided --files into training and validation data

Changed image manipulation operations to opencv which resulted in a massive speedup (thanks to @andbue )

Fixed ScaleToHeightProcessor if used stand-alone

Fixes for hdf5 dataset: closing files properly, shuffling data withing a file and files names during training

Source code(tar.gz)
Source code(zip)
v2.0.0(Jan 19, 2021)
We proudly announce Calamari 2.0. There were great changes in the Code-bases which leads to cleaner code and a more robust usage. Thereto we rely on the tfaip-package providing a lot of functionality to Calamari. Old models will automatically be converted.

Full rework and cleanup of the code-bases

Training is now per epoch (as provided by keras) instead of by iteration

Command line interfaces with only minor changes (epochs instead of max_iters)

New checkpoint version 3 (models get automatically converted, old ones will be backed up)

Using Tensorflow 2.3 as default (bugs in model upgrade in 2.4)

Source code(tar.gz)
Source code(zip)
v1.0.5(Mar 29, 2020)

Source code(tar.gz)
Source code(zip)
v1.0.3(Feb 3, 2020)

Source code(tar.gz)
Source code(zip)
v1.0.1(Oct 31, 2019)
Fixed bug with orientation

Data set viewer shows lines without GT (to support to show prediction datasets)

parameter to stop training at a fixed accuracy

Source code(tar.gz)
Source code(zip)
v1.0.0(Oct 18, 2019)
Support for Tensorflow 2.0

Up to now, old models can not be converted to the new version

Source code(tar.gz)
Source code(zip)
v0.3.5(Jul 17, 2019)
Fixes for python 3.7

Source code(tar.gz)
Source code(zip)
0.3.4(Jun 26, 2019)
Fixed Cross-Fold-Training on windows

Input data set as closable (on exit all threads will be terminated)

Various smaller improvements regarding PageXML-Training/Prediction

Source code(tar.gz)
Source code(zip)
v0.3.3(May 31, 2019)
Improvements of loading data on the fly during training or preloading all data

Fixed high prediction times: Added predict_raw function to Predictor

Source code(tar.gz)
Source code(zip)
v0.3.2(May 21, 2019)

Source code(tar.gz)
Source code(zip)
v0.3.1(Apr 12, 2019)

Fixed a crucial bug that keyboard interrupt (Ctrl+C) hung up.
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 8, 2019)

Completely reworked the data processing queue to run asynchron in separate threads. Thus, data loading on the fly is usually as fast as in-memory training.
Source code(tar.gz)
Source code(zip)
v0.2.5(Mar 29, 2019)
Fixed loading on the fly for FileDataset (default mode)

Allowing to use a fixed data preprocessor in MultiPredictor

Source code(tar.gz)
Source code(zip)
v0.2.4(Feb 11, 2019)

Smaller fixes for data augmentation.
Source code(tar.gz)
Source code(zip)
v0.2.3(Jan 9, 2019)

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 22, 2018)

Support for current Tensorflow (1.11). Old models (< 0.2.x) must be upgraded.
Source code(tar.gz)
Source code(zip)
v0.1.8(Aug 6, 2018)

Added postprocessing. Fixed many smaller bugs.
Source code(tar.gz)
Source code(zip)
v0.1.7(Jun 19, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.6(Jun 4, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.5(May 22, 2018)

Source code(tar.gz)
Source code(zip)

Line based ATR Engine based on OCRopy

Related tags

Overview

Pretrained model repository

Installing

Installation using Pip

Installation using Conda

Command line interface (Standard User)

Prediction of a page

Training of a model

Training a single model

Training a n-fold of models

Evaluating a model

Experimenting with different network hyperparameters (experimental)

Comments

FAQ

How often will the code scanning analysis run?

What will this cost?

What types of problems does CodeQL find?

How do I upgrade my CodeQL engine?

The analysis doesn’t seem to be working

How do I disable LGTM.com?

Which source code hosting platforms does code scanning support?

How do I know this PR is legitimate?

I have another question / how do I get in touch?

Releases(v2.2.2)

v2.2.2(Mar 21, 2022)

v2.2.1(Mar 21, 2022)

v2.1.5(Mar 21, 2022)

v2.2.0(Mar 21, 2022)

v2.1.4(Oct 2, 2021)

v2.1.3(Sep 8, 2021)

v2.1.2(May 30, 2021)

v2.1.1(May 11, 2021)

v2.1.0(May 8, 2021)

v2.0.2(Mar 20, 2021)

v2.0.1(Feb 13, 2021)

v2.0.0(Jan 19, 2021)

v1.0.5(Mar 29, 2020)

v1.0.3(Feb 3, 2020)

v1.0.1(Oct 31, 2019)

v1.0.0(Oct 18, 2019)

v0.3.5(Jul 17, 2019)

0.3.4(Jun 26, 2019)

v0.3.3(May 31, 2019)

v0.3.2(May 21, 2019)

v0.3.1(Apr 12, 2019)

v0.3.0(Apr 8, 2019)

v0.2.5(Mar 29, 2019)

v0.2.4(Feb 11, 2019)

v0.2.3(Jan 9, 2019)

v0.2.1(Oct 22, 2018)

v0.1.8(Aug 6, 2018)

v0.1.7(Jun 19, 2018)

v0.1.6(Jun 4, 2018)

v0.1.5(May 22, 2018)

Owner

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

GDB python tool to pretty print and debug c++ xtensor containers

Demo processor to illustrate OCR-D Python API

Here use convulation with sobel filter from scratch in opencv python .

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Automatically resolve RidderMaster based on TensorFlow & OpenCV

Image augmentation library in Python for machine learning.

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

OCR, Object Detection, Number Plate, Real Time

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Textboxes implementation with Tensorflow (python)

An expandable and scalable OCR pipeline

轻量级公式 OCR 小工具：一键识别各类公式图片，并转换为 LaTeX 格式

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Python Computer Vision from Scratch