DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Overview

Project DeepSpeech

Documentation Task Status

DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.

For the latest release, including pre-trained models and checkpoints, see the latest release on GitHub.

For contribution guidelines, see CONTRIBUTING.rst.

For contact and support information, see SUPPORT.rst.

Issues
  • Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0,  0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 31 , 12, 2048]

    Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 31 , 12, 2048]

    For support and discussions, please use our Discourse forums.

    If you've found a bug, or have a feature request, then please create an issue with the following information:

    • Have I written custom code (as opposed to running examples on an unmodified clone of the repository): no
    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
    • TensorFlow installed from (our builds, or upstream TensorFlow): pip
    • TensorFlow version (use command below): 1.15
    • Python version: 3.5
    • Bazel version (if compiling from source):
    • GCC/Compiler version (if compiling from source):
    • CUDA/cuDNN version: 10.0
    • GPU model and memory: 4 gtx 1080 Ti
    • Exact command to reproduce:
    [email protected]:~/projects/DeepSpeech$ more .compute_msprompts
    #!/bin/bash
    
    set -xe
    
    #apt-get install -y python3-venv libopus0
    
    #python3 -m venv /tmp/venv
    #source /tmp/venv/bin/activate
    
    #pip install -U setuptools wheel pip
    #pip install .
    #pip uninstall -y tensorflow
    #pip install tensorflow-gpu==1.14
    
    #mkdir -p ../keep/summaries
    
    data="${SHARED_DIR}/data"
    fis="${data}/LDC/fisher"
    swb="${data}/LDC/LDC97S62/swb"
    lbs="${data}/OpenSLR/LibriSpeech/librivox"
    cv="${data}/mozilla/CommonVoice/en_1087h_2019-06-12/clips"
    npr="${data}/NPR/WAMU/sets/v0.3"
    
    python -u DeepSpeech.py \
      --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv \
      --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv \
      --test_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv \
      --train_batch_size 12 \
      --dev_batch_size 24 \
      --test_batch_size 24 \
      --scorer ~/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer \
      --alphabet_config_path ~/projects/corpora/deepspeech-pretrained-ptbr/alphabet.txt \
      --train_cudnn \
      --n_hidden 2048 \
      --learning_rate 0.0001 \
      --dropout_rate 0.40 \
      --epochs 150 \
      --noearly_stop \
      --audio_sample_rate 8000 \
      --save_checkpoint_dir ~/projects/corpora/deepspeech-fulltrain-ptbr  \
      --use_allow_growth \
      --log_level 0
    

    I'm getting the following error when using my ptbr 8khz dataset to train. Have tried to downgrade and upgrade cuda, cudnn, nvidia-drivers, and ubuntu (16 and 18) and the error persists. I have tried with datasets containing two different characteristics: 6s and 15s in length. Both contain audios in 8khz.

    [email protected]:~/projects/DeepSpeech$ bash .compute_msprompts
    + data=/data
    + fis=/data/LDC/fisher
    + swb=/data/LDC/LDC97S62/swb
    + lbs=/data/OpenSLR/LibriSpeech/librivox
    + cv=/data/mozilla/CommonVoice/en_1087h_2019-06-12/clips
    + npr=/data/NPR/WAMU/sets/v0.3
    + python -u DeepSpeech.py --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv --test_files /home/an
    dre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv --train_batch_size 12 --dev_batch_size 24 --test_batch_size 24 --scorer /home/andre/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer --alphabet_config_path /home/andre/pro
    jects/corpora/deepspeech-pretrained-ptbr/alphabet.txt --train_cudnn --n_hidden 2048 --learning_rate 0.0001 --dropout_rate 0.40 --epochs 150 --noearly_stop --audio_sample_rate 8000 --save_checkpoint_dir /home/andre/projects/corpora/deepspeech-fulltrain-ptbr --use_allow_g
    rowth --log_level 0
    2020-06-18 12:30:07.508455: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2020-06-18 12:30:07.531012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3597670000 Hz
    2020-06-18 12:30:07.531588: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5178d70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-06-18 12:30:07.531608: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2020-06-18 12:30:07.533960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
    2020-06-18 12:30:09.563468: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5416390 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    2020-06-18 12:30:09.563492: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563497: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563501: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563505: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.570577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:09.571728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:09.572862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:09.573993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:09.574226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:09.575280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:09.576167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:09.576401: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:09.577541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:09.578426: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:09.581112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:09.589736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    2020-06-18 12:30:09.589770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:09.594742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-06-18 12:30:09.594757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
    2020-06-18 12:30:09.594763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
    2020-06-18 12:30:09.594767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
    2020-06-18 12:30:09.594770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
    2020-06-18 12:30:09.594774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
    2020-06-18 12:30:09.600428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.602038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.603572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.605112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
    swig/python detected a memory leak of type 'Alphabet *', no destructor found.
    W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAI
    NING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
    2020-06-18 12:30:10.102127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:10.103272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:10.104379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:10.105484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:10.105521: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:10.105533: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:10.105562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:10.105574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:10.105586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:10.105597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:10.105610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:10.114060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_types(iterator)`.
    W0618 12:30:10.218584 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_types(iterator)`.
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_shapes(iterator)`.
    W0618 12:30:10.218781 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_shapes(iterator)`.
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_classes(iterator)`.
    W0618 12:30:10.218892 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_classes(iterator)`.
    WARNING:tensorflow:
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    W0618 12:30:10.324707 139639980619584 lazy_loader.py:50]
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dt
    ype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a f
    uture version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326584 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype i
    s deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    W0618 12:30:10.401312 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
    W0618 12:30:11.297271 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will
    be removed in a future version.
    Instructions for updating:
    Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
    2020-06-18 12:30:11.458650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:11.459790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:11.460897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:11.462003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:11.462041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:11.462071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:11.462085: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:11.462097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:11.462109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:11.462121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:11.462133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:11.470539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    2020-06-18 12:30:11.470679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-06-18 12:30:11.470694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
    2020-06-18 12:30:11.470699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
    2020-06-18 12:30:11.470703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
    2020-06-18 12:30:11.470707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
    2020-06-18 12:30:11.470710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
    2020-06-18 12:30:11.476196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.477355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.478490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.479608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute ca
    pability: 6.1)
    D Session opened.
    I Could not find best validating checkpoint.
    I Could not find most recent checkpoint.
    I Initializing all variables.
    2020-06-18 12:30:12.233482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    I STARTING Optimization
    Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                             2020-06-18 12:30:14.672316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    Epoch 0 |   Training | Elapsed Time: 0:00:16 | Steps: 33 | Loss: 18.239303                                                                                                                                                                                                   2
    020-06-18 12:30:30.589204: E tensorflow/stream_executor/dnn.cc:588] CUDNN_STATUS_EXECUTION_FAILED
    in tensorflow/stream_executor/cuda/cuda_dnn.cc(1778): 'cudnnRNNForwardTrainingEx( cudnn.handle(), rnn_desc.handle(), input_desc.data_handle(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.param
    s_handle(), params.opaque(), output_desc.data_handle(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, workspace.opaque(), w
    orkspace.size(), reserve_space.opaque(), reserve_space.size())'
    2020-06-18 12:30:30.589243: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_uni
    ts, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
    Traceback (most recent call last):
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
        return fn(*args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
        target_list, run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
        run_metadata)
    tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
      (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
      (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
             [[tower_2/CTCLoss/_147]]
    1 successful operations.
    2 derived errors ignored.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "DeepSpeech.py", line 12, in <module>
        ds_train.run_script()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
        absl.app.run(main)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
        train()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 608, in train
        train_loss, _ = run_set('train', epoch, train_init_op)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 568, in run_set
        feed_dict=feed_dict)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
        run_metadata_ptr)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
        feed_dict_tensor, options, run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
        run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
      (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
      (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
             [[tower_2/CTCLoss/_147]]
    1 successful operations.
    2 derived errors ignored.
    
    Original stack trace for 'tower_0/cudnn_lstm/CudnnRNNV3_1':
      File "DeepSpeech.py", line 12, in <module>
        ds_train.run_script()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
        absl.app.run(main)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
        train()
    
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 487, in train
        gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 313, in get_tower_results
        avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 240, in calculate_mean_edit_distance_and_loss
        logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 191, in create_model
        output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 129, in rnn_impl_cudnn_rnn
        sequence_lengths=seq_length)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
        outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
        return converted_call(f, options, args, kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
        return _call_unconverted(f, args, kwargs, options)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
        return f(*args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 440, in call
        training)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 518, in _forward
        seed=self._seed)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1132, in _cudnn_rnn
        outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 2051, in cudnn_rnnv3
        time_major=time_major, name=name)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
        op_def=op_def)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
        attrs, op_def, compute_device)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
        op_def=op_def)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
        self._traceback = tf_stack.extract_stack()
    
    upstream-issue 
    opened by andrenatal 154
  • Use this model for Urdu language

    Use this model for Urdu language

    I wanted to use this model for urdu language .But I found this in FAQ '' DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

    How can I design a neural network for speech transcription for languages like urdu ?

    enhancement Priority: P4 
    opened by MalikMahnoor 79
  • Electron Windows build (electron-builder) is not finding the deepspeech.node binding

    Electron Windows build (electron-builder) is not finding the deepspeech.node binding

    I'm using electron-builder to package my electron app into an installer. It's working great in Mac and Linux, but the Windows version cannot find the deepspeech native binding file.

    I am not sure if this is a bug that would need to be resolved in the DeepSpeech module, or in electron-builder, or in electron itself.

    I could follow up with a small test example to demonstrate the problem.

    Basically, after creating the Windows exe installer (npm run dist from electron-builder), if I find the executable in my file system and run it directly from Git Bash, I can see the error messages in the console, and I receive this:

    electron/js2c/asar.js:140
          if (!isAsar) return old.apply(this, arguments);
                                  ^
    
    Error: The specified module could not be found.
    \\?\C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node
        at process.func [as dlopen] (electron/js2c/asar.js:140:31)
        at Object.Module._extensions..node (internal/modules/cjs/loader.js:1034:18)
        at Object.func [as .node] (electron/js2c/asar.js:149:18)
        at Module.load (internal/modules/cjs/loader.js:815:32)
        at Module._load (internal/modules/cjs/loader.js:727:14)
        at Function.Module._load (electron/js2c/asar.js:769:28)
        at Module.require (internal/modules/cjs/loader.js:852:19)
        at require (internal/modules/cjs/helpers.js:74:18)
        at Object.<anonymous> (C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar\node_modules\deepspeech\index.js:18:17)
        at Module._compile (internal/modules/cjs/loader.js:967:30)
    

    What's weird is this file actually does exist:

    C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node
    

    Maybe it's the junk at the start that causes a problem, I'm not sure.

    \\?
    \C:\
    

    I kind of suspect electron-builder's app.asar package format is probably where the problem lies, and I may file another bug report there too and reference this one.

    bug 
    opened by dsteinman 69
  • No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

    No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

    Hello,

    when I trie to use the setup.py on from version 0.7.4 it always calls this error:

    No local packages or working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION error: Could not find suitable distribution for Requirement.parse('ds_ctcdecoder==training/deepspeech_training/VERSION')

    With the version 0.7.3 and older it finds the ds_ctcdecoder but always calls that I need numpy in version 1.16, when I install 1.16 it calls me that I need numpy 1.13.3 because of other modules and so on. That's why I think my only chance to use DeepSpeech is with the newest version.

    I'm on windwos 10 with python 3.6.

    Thanks in advance!

    bug help wanted good first bug 
    opened by SirZontax 65
  • Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

    Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
    • TensorFlow installed from (our builds, or upstream TensorFlow): mozilla tensorflow
    • TensorFlow version (use command below): tensorflow-gpu 1.13
    • Python version: 3.6
    • Bazel version (if compiling from source): 0.19.2
    • GCC/Compiler version (if compiling from source):
    • CUDA/cuDNN version: 10.0
    • GPU model and memory: NVIDIA K80
    • Exact command to reproduce:

    I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

    opened by testdeepv 62
  • The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.

    The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.

    This follows the issue #3180 .

    I suggest a new way of handling timesteps produced by the CTC decoder. There is no strange heuristic, and I think the logic is clear : when fusing two different paths leading to the same prefix, not only fuse the probabilities (the probabilities are added), but also fuse the timestep sequences (for the last letter in the sequence, choose the timestep from the most probable path).

    The place where two different paths leading to the same prefix are fused are the places where log_sum_exp is called, because this function fuses the probabilities. So, timesteps would now be fused at the same places.

    The other change is that each PathTrie node would now store the full sequence of timesteps. This is because one prefix can be an ancestor of another and their timesteps on a given node can differ. Having the full sequence of timesteps in each node, we have no need to duplicate a node with different timesteps, and it is much simpler like that. Moreover, it makes sense to store the full sequence of timesteps, because the combined probabilities are also stored there. The total probability is not the sum of the probability of each output token, and, in the same way, the correct sequence of timesteps is not the concatenation of the timestep of each output token.

    Since I need to compare the probability of different paths (to keep the timesteps of the most probable one), it is important to compare paths of the same length (eg. paths from the beginning up to the current time). So, exactly the same way as it is done for the probabilities, I need to know the timesteps of the previous time, and store the timesteps of the current time separately.

    In the end, timesteps are handled in a way very similar to the way probabilities are handled.

    Results on an example

    To evaluate the resulting timesteps, I first take the argmax of my logits. In my example, it gives :

    tou_________________________________________________ss  les  _a__mouurreuux  de  se_p_ort__ diiivverrr_  ss'enn__ _r_é___j_uuii__rr__on_t____    aa_v_eecc_   l''aap__p_rroo___chhee_____    de_  ll'hhi__v_e_rr____  et   la   rre__p_rri_ssee  dee  la  c_ouppee ddu  mon_deee      ss__kk_i___      less    ii_mm_a__ggees_   de   _g_ll_i_ss__ssee____________________             ree__t_rrou_vveennt    uunee    __pllaa___cee____      de    _cchhooixx__  d_ans_  lles  _pp_a____ggeess        ssspp_o_r_t_ii_vees_  de  v_o_s_ _jourrnnaauxx  ttéé_l_é___v_ii___ss_é__s__      ddeeu_x_   __é___pprreeuu_vvees___________________          _auu_jjoouurrdd''hhuuii___       _o____nno___rr_o_____d__a___mm________ ___s___a_n______t__a______  _q__a___tt__e___rr_i____n__a_____     __p_r____mmie_r__  s__a___l__o___m___    _g_é____ant__  de   lla  _c_ou_ppee  ddu   m_on___deee____________________________________________________
    

    As the logits are the only input of the decoder, I base my evaluation on them instead of comparing with the audio file directly. It is known that the CTC loss does not guarantee alignment between the audio file and the logits, so the best thing the decoder can do is to fit the logits as best as it can. This is reasonable because, in practice, the logits are aligned quite well with the audio file.

    Then, for each word, I take the part of the logits corresponding to the output timesteps, take the argmax (as said above), and print the corresponding decoded text.

    Finally, I assume that good timesteps should lead to a good match between the word and its corresponding text decoded from the argmax of the logits.

    Before this PR, the result in my example is (text between slashes is from the logits argmax, spaces are trimmed) :

    [WordScoreRange(word=tous /tou/, score=None, ranges=((0, 4),)),   
     WordScoreRange(word=les /les/, score=None, ranges=((55, 59),)),       
     WordScoreRange(word=amoureux /amoureu/, score=None, ranges=((60, 74),)),
     WordScoreRange(word=de /d/, score=None, ranges=((75, 78),)),           
     WordScoreRange(word=sport /seport/, score=None, ranges=((80, 90),)),           
     WordScoreRange(word=divers /diver/, score=None, ranges=((91, 102),)),
     WordScoreRange(word=s'en /s'en/, score=None, ranges=((103, 111),)),  
     WordScoreRange(word=réjouiront /réjuiron/, score=None, ranges=((112, 136),)),
     WordScoreRange(word=avec /avec/, score=None, ranges=((141, 153),)),  
     WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((155, 179),)),
     WordScoreRange(word=de /de/, score=None, ranges=((185, 191),)),  
     WordScoreRange(word=l'hiver /l'hive/, score=None, ranges=((192, 206),)),
     WordScoreRange(word=et /e/, score=None, ranges=((208, 215),)),
     WordScoreRange(word=la /la/, score=None, ranges=((217, 221),)),
     WordScoreRange(word=reprise /reprise/, score=None, ranges=((222, 238),)),
     WordScoreRange(word=de /de/, score=None, ranges=((240, 243),)),
     WordScoreRange(word=la /la/, score=None, ranges=((244, 248),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((249, 257),)),
     WordScoreRange(word=du /d/, score=None, ranges=((258, 261),)),
     WordScoreRange(word=monde /monde/, score=None, ranges=((263, 270),)),
     WordScoreRange(word=de /e/, score=None, ranges=((271, 275),)),
     WordScoreRange(word=ski /ski/, score=None, ranges=((276, 286),)),
     WordScoreRange(word=les /les/, score=None, ranges=((290, 298),)),
     WordScoreRange(word=images /image/, score=None, ranges=((300, 316),)),
     WordScoreRange(word=de /d/, score=None, ranges=((318, 322),)),
     WordScoreRange(word=glisse /glisse/, score=None, ranges=((324, 341),)),
     WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((363, 394),)),
     WordScoreRange(word=une /une/, score=None, ranges=((395, 402),)),
     WordScoreRange(word=place /place/, score=None, ranges=((404, 419),)),
     WordScoreRange(word=de /de/, score=None, ranges=((425, 432),)),
     WordScoreRange(word=choix /choix/, score=None, ranges=((433, 445),)),
     WordScoreRange(word=dans /dans/, score=None, ranges=((448, 455),)),
     WordScoreRange(word=les /les/, score=None, ranges=((457, 462),)),
     WordScoreRange(word=pages /pages/, score=None, ranges=((463, 478),)),
     WordScoreRange(word=sportives /sportives/, score=None, ranges=((479, 506),)),
     WordScoreRange(word=de /de/, score=None, ranges=((508, 511),)),
     WordScoreRange(word=vos /vos/, score=None, ranges=((512, 518),)),
     WordScoreRange(word=journaux /journaux/, score=None, ranges=((520, 532),)),
     WordScoreRange(word=télévisés /télévisé/, score=None, ranges=((533, 559),)),
     WordScoreRange(word=deux /deux/, score=None, ranges=((563, 575),)),
     WordScoreRange(word=épreuves /épreuve/, score=None, ranges=((577, 598),)),
     WordScoreRange(word=aujourd'hui /aujourd'hu/, score=None, ranges=((618, 649),)),
     WordScoreRange(word=on /o/, score=None, ranges=((654, 665),)),
     WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
     WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
     WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 756),)),
     WordScoreRange(word=premier /prmier/, score=None, ranges=((762, 783),)),
     WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
     WordScoreRange(word=géant /géant/, score=None, ranges=((805, 818),)),
     WordScoreRange(word=de /d/, score=None, ranges=((819, 823),)),
     WordScoreRange(word=la /l/, score=None, ranges=((825, 829),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((831, 841),)),
     WordScoreRange(word=du /d/, score=None, ranges=((842, 846),)),
     WordScoreRange(word=monde /mon/, score=None, ranges=((848, 857),))]
    

    After this PR, the result in my example is :

    [WordScoreRange(word=tous /tous/, score=None, ranges=((0, 54),)), 
     WordScoreRange(word=les /les/, score=None, ranges=((56, 59),)),        
     WordScoreRange(word=amoureux /amoureux/, score=None, ranges=((62, 75),)),
     WordScoreRange(word=de /de/, score=None, ranges=((77, 79),)),          
     WordScoreRange(word=sport /seport/, score=None, ranges=((81, 91),)),           
     WordScoreRange(word=divers /diver/, score=None, ranges=((92, 103),)),
     WordScoreRange(word=s'en /s'en/, score=None, ranges=((105, 113),)),  
     WordScoreRange(word=réjouiront /réjuiront/, score=None, ranges=((114, 140),)),
     WordScoreRange(word=avec /avec/, score=None, ranges=((145, 155),)),  
     WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((158, 185),)),
     WordScoreRange(word=de /de/, score=None, ranges=((189, 192),)),  
     WordScoreRange(word=l'hiver /l'hiver/, score=None, ranges=((194, 212),)),
     WordScoreRange(word=et /et/, score=None, ranges=((214, 216),)),                                                               
     WordScoreRange(word=la /la/, score=None, ranges=((219, 221),)),
     WordScoreRange(word=reprise /reprise/, score=None, ranges=((224, 239),)),                                                                                                                         
     WordScoreRange(word=de /de/, score=None, ranges=((241, 244),)),            
     WordScoreRange(word=la /la/, score=None, ranges=((246, 248),)),              
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((250, 258),)),
     WordScoreRange(word=du /du/, score=None, ranges=((259, 262),)),            
     WordScoreRange(word=monde /monde/, score=None, ranges=((264, 272),)),
     WordScoreRange(word=de //, score=None, ranges=((273, 275),)),
     WordScoreRange(word=ski /ski/, score=None, ranges=((278, 289),)),
     WordScoreRange(word=les /les/, score=None, ranges=((295, 299),)),
     WordScoreRange(word=images /images/, score=None, ranges=((303, 318),)),
     WordScoreRange(word=de /de/, score=None, ranges=((321, 323),)),
     WordScoreRange(word=glisse /glisse/, score=None, ranges=((327, 362),)),
     WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((375, 394),)),
     WordScoreRange(word=une /une/, score=None, ranges=((398, 403),)),
     WordScoreRange(word=place /place/, score=None, ranges=((409, 424),)),
     WordScoreRange(word=de /de/, score=None, ranges=((430, 432),)),
     WordScoreRange(word=choix /choix/, score=None, ranges=((437, 448),)),
     WordScoreRange(word=dans /dans/, score=None, ranges=((450, 456),)),
     WordScoreRange(word=les /les/, score=None, ranges=((458, 462),)),
     WordScoreRange(word=pages /pages/, score=None, ranges=((465, 479),)),
     WordScoreRange(word=sportives /sportives/, score=None, ranges=((487, 507),)),
     WordScoreRange(word=de /de/, score=None, ranges=((509, 511),)),
     WordScoreRange(word=vos /vos/, score=None, ranges=((513, 519),)),
     WordScoreRange(word=journaux /journaux/, score=None, ranges=((521, 533),)),
     WordScoreRange(word=télévisés /télévisés/, score=None, ranges=((535, 562),)),
     WordScoreRange(word=deux /deux/, score=None, ranges=((568, 576),)),
     WordScoreRange(word=épreuves /épreuves/, score=None, ranges=((581, 618),)),
     WordScoreRange(word=aujourd'hui /aujourd'hui/, score=None, ranges=((629, 654),)),
     WordScoreRange(word=on /onoro/, score=None, ranges=((662, 680),)),
     WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
     WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
     WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 761),)),
     WordScoreRange(word=premier /prmier/, score=None, ranges=((768, 783),)),
     WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
     WordScoreRange(word=géant /géant/, score=None, ranges=((808, 820),)),
     WordScoreRange(word=de /de/, score=None, ranges=((822, 824),)),
     WordScoreRange(word=la /l/, score=None, ranges=((826, 829),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((833, 842),)),
     WordScoreRange(word=du /d/, score=None, ranges=((843, 846),)),
     WordScoreRange(word=monde /mond/, score=None, ranges=((850, 858),))]
    

    We can see that before this PR, there are 17 words where timesteps are too early (about one letter shift, it is visible at the end but not at the begining of words because I have trimed spaces). After this PR, the fit is almost prefect. For some reason, there are still 3 remaining errors, all in the 4 last words.

    opened by godefv 55
  • Adapting engine to any Custom Language

    Adapting engine to any Custom Language

    I was wondering what kinds of modifications would be needed to use this engine for languages other than English (other than a new language model and a new words.txt file) ? In particular, I was interested if it could be used with a Cyrillic script because of this: "data in the transcripts must match the [a-z ]+ regex", and if yes how hard would it be to adapt it. I think I could circumvent this problem by creating a translator that can translate the text from a Cyrillic script to [a-z ]+ format, but would be preferable if it could use a Cyrillic script directly.

    Thanks in advance

    question 
    opened by istojan 54
  • Language model incorrectly drops spaces for out-of-vocabulary words

    Language model incorrectly drops spaces for out-of-vocabulary words

    Mozilla DeepSpeech will sometimes create long runs of text with no spaces:

    omiokaarforfthelastquarterwastoget
    

    This happens even with short audio clips (4 seconds) with a native American english speaker recorded using a high quality microphone in Mac OS X laptops. I've isolated the problem to interaction with the language model rather than the acoustic model or length of audio clips, as the problem goes away when the language model is turned off.

    The problem might be related to encountering out-of-vocabulary terms.

    I’ve put together test files with results that show the issue is related to the language model somehow rather than the length of the audio or the acoustic model.

    I’ve provided 10 chunked WAV files at 16khz 16 bit depth, each 4 seconds long, that are a subset of a fuller 15 minute audio file (I have not provided that full 15 minute file, as a few shorter reproducible chunks are sufficient to reproduce the problem):

    https://www.dropbox.com/sh/3qy65r6wo8ldtvi/AAAAVinsD_kcCi8Bs6l3zOWFa?dl=0

    The audio segments deliberately include occasional out-of-vocabulary terms, mostly technical, such as “OKR”, “EdgeStore”, “CAPE”, etc.

    Also in that folder are several text files that show the output with the standard language model being used, showing the garbled words together (chunks_with_language_model.txt):

    Running inference for chunk 1
    so were trying again a maybeialstart this time
    
    Running inference for chunk 2
    omiokaarforfthelastquarterwastoget
    
    Running inference for chunk 3
    to car to state deloedmarchinstrumnalha
    
    Running inference for chunk 4
    a tonproductcaseregaugesomd produce sidnelfromthat
    
    Running inference for chunk 5
    i am a to do that you know 
    
    Running inference for chunk 6
    we finish the kepehandlerrwend finished backfileprocessing 
    
    Running inference for chunk 7
    and is he teckdatthatwewould need to do to split the cape 
    
    Running inference for chunk 8
    out from sir handler and i are on new 
    
    Running inference for chunk 9
    he is not monolithic am andthanducotingswrat 
    
    Running inference for chunk 10
    relizationutenpling paws on that until it its a product signal
    

    Then, I’ve provided similar output with the language model turned off (chunks_without_language_model.txt):

    Running inference for chunk 1
    so we're tryng again ah maybe alstart this time
    
    Running inference for chunk 2
    omiokaar forf the last quarter was to get
    
    Running inference for chunk 3
    oto car to state deloed march in strumn alha
    
    Running inference for chunk 4
    um ton product  caser egauges somd produc sidnel from that
    
    Running inference for chunk 5
    am ah to do that ou nowith
    
    Running inference for chunk 6
    we finishd the kepe handlerr wend finished backfile processinga
    
    Running inference for chunk 7
    on es eteckdat that we would need to do to split the kae ha
    
    Running inference for chunk 8
    rout frome sir hanler and ik ar on newh
    
    Running inference for chunk 9
    ch las not monoliic am andthan ducotings wrat 
    
    Running inference for chunk 10
    relization u en pling a pas on that until it its a product signal
    

    I’ve included both these files in the shared Dropbox folder link above.

    Here’s what the correct transcript should be, manually done (chunks_correct_manual_transcription.txt):

    So, we're trying again, maybe I'll start this time.
    
    So my OKR for the last quarter was to get AutoOCR to a state that we could
    launch an external alpha, and product could sort of gauge some product signal
    from that. To do that we finished the CAPE handler, we finished backfill 
    processing, we have some tech debt that we would need to do to split the CAPE 
    handler out from the search handler and make our own new handler so its not
    monolithic, and do some things around CAPE utilization. We are kind of putting
    a pause on that until we get some product signal.
    

    This shows the language model is the source of this problem; I’ve seen anecdotal reports from the official message base and blog posts that this is a wide spread problem. Perhaps when the language model hits an unknown n-gram, it ends up combining all of them together rather than retaining the space between them.

    Discussion around this bug started on the standard DeepSpeech discussion forum: https://discourse.mozilla.org/t/text-produced-has-long-strings-of-words-with-no-spaces/24089/13 https://discourse.mozilla.org/t/longer-audio-files-with-deep-speech/22784/3

    • Have I written custom code (as opposed to running examples on an unmodified clone of the repository):

    The standard client.py was slightly modified to segment the longer 15 minute audio clip into 4 second blocks.

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

    Mac OS X 10.12.6 (16G1036)

    • TensorFlow installed from (our builds, or upstream TensorFlow):

    Both Mozilla DeepSpeech and TensorFlow were installed into a virtualenv setup via the following requirements.txt file:

    tensorflow==1.4.0
    deepspeech==0.1.0
    numpy==1.13.3
    scipy==0.19.1
    webrtcvad==2.0.10
    
    • TensorFlow version (use command below):
    ('v1.4.0-rc1-11-g130a514', '1.4.0')
    
    • Python version:
    Python 2.7.13
    
    • Bazel version (if compiling from source):

    Did not compile from source.

    • GCC/Compiler version (if compiling from source):

    Same

    • CUDA/cuDNN version:

    Used CPU only version

    • GPU model and memory:

    Used CPU only version

    • Exact command to reproduce:

    I haven't provided my full modified client.py that segments longer audio, but to run with a language model using the standard deepspeech command against a known 4 seconds audio clip included in the Dropbox folder shared above you can run the following:

    # Set $DEEPSPEECH to where full Deep Speech checkout is; note that my own git checkout
    # for the `deepspeech` runner is at git sha fef25e9ea6b0b6d96dceb610f96a40f2757e05e4
    deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt $DEEPSPEECH/models/lm.binary $DEEPSPEECH/models/trie
    
    # Similar command to run without language model -- spaces retained for unknown words:
    deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt 
    

    This is clearly a bug and not a feature :)

    opened by BradNeuberg 54
  • Support for Windows

    Support for Windows

    I'm still editing the docs, preparing for CUDA, and finishing the C# examples.

    IMPORTANT NOTE: Did not try to train on Windows yet, my initial goal is to enable inference with the clients on Windows.
    Thanks to @reuben and @lissyx, they helped me a lot.

    Fixes #1123

    Epic 
    opened by carlfm01 51
  • Generate trie lm::FormatLoadException

    Generate trie lm::FormatLoadException

    I'm following this tutorial : https://discourse.mozilla.org/t/tutorial-how-i-trained-a-specific-french-model-to-control-my-robot/22830 to create a French model.

    The problem is when generating the trie file with this command :

    ./generate_trie data/cassia/alphabet.txt data/cassia/lm.binary data/cassia/vocabulary.txt data/cassia/trie

    I have this output :

    terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException. The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers Abandon (core dumped)

    I tried several times to generate my lm.binary with kenlm (./build_binary -T -s words.arpa lm.binary) but still the same error.

    opened by yoann1995 49
Releases(v0.10.0-alpha.3)
Owner
Mozilla
This technology could fall into the right hands.
Mozilla
A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

null 2 Dec 15, 2021
Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

Anthony Zhang 6.1k Feb 20, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 1.4k Feb 13, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Uexp2Awb extract unpack asset file (form unreal engine 4 pak) with extenstion .uexp which contain awb/acb (cri/cpk like) sound or music resource. i ju

max 4 Feb 7, 2022
A Music Player Bot for Discord Servers

A Music Player Bot for Discord Servers

Halil Acar 2 Oct 25, 2021
🎵 A music bot for discord servers!

music bot A music bot for Discord Servers Features Play songs in your discord server Get the lyrics without going on a web explorer Commands Command P

null 1 Jan 18, 2022
Real-time audio visualizations (spectrum, spectrogram, etc.)

Friture Friture is an application to visualize and analyze live audio data in real-time. Friture displays audio data in several widgets, such as a sco

Timothée Lecomte 623 Feb 15, 2022
Real-Time Spherical Microphone Renderer for binaural reproduction in Python

ReTiSAR Implementation of the Real-Time Spherical Microphone Renderer for binaural reproduction in Python [1][2]. Contents: | Requirements | Setup | Q

Division of Applied Acoustics at Chalmers University of Technology 43 Feb 14, 2022
A voice assistant which can be used to interact with your computer and controls your pc operations

Introduction ??‍?? It is a voice assistant which can be used to interact with your computer and also you have been seeing it in Iron man movies, but t

Sujith 56 Feb 20, 2022
Royal Music You can play music and video at a time in vc

Royals-Music Royal Music You can play music and video at a time in vc Commands SOON String STRING_SESSION Deployment ?? Credits • ??ᴏᴍʏᴀ⃝??ᴇᴇᴛ • ??ғғɪ

null 2 Nov 23, 2021
?️ Open Source Audio Matching and Mastering

Matching + Mastering = ❤️ Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering. It follows a si

Sergey Grishakov 571 Feb 16, 2022
SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully

SU Projects 54 Feb 5, 2022
Stevan KZ 1 Oct 27, 2021
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

null 5.9k Feb 11, 2022
Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

VcPlayer Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜ Account requirements A Telegram account to use as the music bot, You cannot use regular

Akki ThePro 2 Dec 25, 2021
Synthesia but open source, made in python and free

PyPiano Synthesia but open source, made in python and free Requirements are in requirements.txt If you struggle with installation of pyaudio, run : pi

DaCapo 9 Feb 15, 2022
This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

OneBit 1 Nov 5, 2021
This library provides common speech features for ASR including MFCCs and filterbank energies.

python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs ar

James Lyons 2k Feb 14, 2022