Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Overview

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

  • One-line usage;
  • A large library of voices;
  • A fully end-to-end pipeline;
  • Naturally sounding speech;
  • No GPU or training required;
  • Minimalism and lack of dependencies;
  • Faster than real-time on one CPU thread (!!!);
  • Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Quality Colab
English (en_v5) ✔️ ✔️ ✔️ link Open In Colab
German (de_v4) ✔️ ✔️ link Open In Colab
English (en_v3) ✔️ ✔️ ✔️ link Open In Colab
German (de_v3) ✔️ link Open In Colab
German (de_v1) ✔️ ✔️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ✔️ N/A Open In Colab

Model flavours:

jit jit jit jit jit_q jit_q onnx onnx onnx onnx
xsmall small large xlarge xsmall small xsmall small large xlarge
English en_v5 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v4_0 ✔️ ✔️
English en_v3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
German de_v4 ✔️ ✔️
German de_v3 ✔️
German de_v1 ✔️ ✔️
Spanish es_v1 ✔️ ✔️
Ukrainian ua_v3 ✔️ ✔️ ✔️

Dependencies

  • All examples:
    • torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
    • torchaudio, latest version bound to PyTorch should work
    • omegaconf, latest just should work
  • Additional for ONNX examples:
    • onnx, latest just should work
    • onnxruntime, latest just should work
  • Additional for TensorFlow examples:
    • tensorflow, latest just should work
    • tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker Auto-stress Language SR Colab
aidar_v2 yes ru (Russian) 8000, 16000 Open In Colab
baya_v2 yes ru (Russian) 8000, 16000 Open In Colab
irina_v2 yes ru (Russian) 8000, 16000 Open In Colab
kseniya_v2 yes ru (Russian) 8000, 16000 Open In Colab
natasha_v2 yes ru (Russian) 8000, 16000 Open In Colab
ruslan_v2 yes ru (Russian) 8000, 16000 Open In Colab
lj_v2 no en (English) 8000, 16000 Open In Colab
thorsten_v2 no de (German) 8000, 16000 Open In Colab
tux_v2 no es (Spanish) 8000, 16000 Open In Colab
gilles_v2 no fr (French) 8000, 16000 Open In Colab
multi_v2 no ru, en, de, es, fr, tt 8000, 16000 Open In Colab
aigul_v2 no ba (Bashkir) 8000, 16000 Open In Colab
erdni_v2 no xal (Kalmyk) 8000, 16000 Open In Colab
dilyara_v2 no tt (Tatar) 8000, 16000 Open In Colab
dilnavoz_v2 no uz (Uzbek) 8000, 16000 Open In Colab

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

  • torch, 1.9+;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

Open In Colab

Open on Torch Hub

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

  • Standalone usage just requires PyTorch 1.9+ and python standard library;
  • Please see the detailed examples in Colab;
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • Modern Portable Voice Activity Detector Released - link

Chinese

  • STT:
    • 迈向语音识别领域的 ImageNet 时刻 - link
    • 语音领域学术界和工业界的七宗罪 - link

Russian

  • STT

    • Мы опубликовали современные STT модели сравнимые по качеству с Google - link
    • Понижаем барьеры на вход в распознавание речи - link
    • Огромный открытый датасет русской речи версия 1.0 - link
    • Насколько Быстрой Можно Сделать Систему STT? - link
    • Наша система Speech-To-Text - link
    • Speech To Text - link
  • TTS:

    • Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
  • VAD:

    • Мы опубликовали современный Voice Activity Detector и не только -link

Donations

Please use the "sponsor" button.

Comments
  • Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!

    enhancement help wanted 
    opened by aleks73337 28
  • README's Standalone Use misses to mention NumPy

    README's Standalone Use misses to mention NumPy

    Currently, https://github.com/snakers4/silero-models#standalone-use states that:

    • Standalone usage just requires PyTorch 1.10+ and python standard library;

    but I had to install NumPy as well to make the example work.

    bug 
    opened by ghost 23
  • Bug report - RuntimeError: Unknown qengine

    Bug report - RuntimeError: Unknown qengine

    Hello. Great project! I would like to test a standard example, but at the line: model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") I get an error: \lib\site-packages\torch\jit_script.py", line 351, in unpackage_script_module cpp_module = torch._C._import_ir_module_from_package( untimeError: Unknown qengine

    Python 10.4 , Torch 11.0 , device='cpu', Windows 10 Model: torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/ru_v3.pt', local_file)
    Tell me, please, how to fix it?

    bug 
    opened by lik2129 14
  • Bug report -problem loading STT model on Windows

    Bug report -problem loading STT model on Windows

    Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

    code:

    import torch
    import zipfile
    import torchaudio
    from glob import glob
    
    device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
    model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                           model='silero_stt',
                                           language='en', # also available 'de', 'es'
                                           device=device)
    

    Error: RuntimeError Traceback (most recent call last) C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in 1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU ----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', 3 model='silero_stt', 4 language='en', # also available 'de', 'es' 5 device=device)

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs) 397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) 398 --> 399 model = _load_local(repo_or_dir, model, *args, **kwargs) 400 return model 401

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs) 426 427 entry = _load_entry_from_hubconf(hub_module, model) --> 428 model = entry(*args, **kwargs) 429 430 sys.path.remove(hubconf_dir)

    ~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs) 32 assert language in available_languages 33 ---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model), 35 **kwargs) 36 utils = (read_batch,

    ~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device) 128 progress=True) 129 --> 130 model = torch.jit.load(model_path, map_location=device) 131 model.eval() 132 return model, Decoder(model.labels)

    c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer(

    RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit

    bug 
    opened by lev007-ops 13
  • Feature request - SAPI5

    Feature request - SAPI5

    SAPI5 compatibility

    🚀 Feature

    Motivation

    Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!

    enhancement 
    opened by studennikov-serg 11
  • How to obtain an intermediate layer output?

    How to obtain an intermediate layer output?

    How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.

    help wanted 
    opened by prajwalkr 11
  • Feature request - Expressiveness

    Feature request - Expressiveness

    🚀 Feature

    Right now, in French STT, there is no decay upon a end of sentence. So if you have 2 sentences, the prosody is wrong and painful to hear. Each sentence by itself is almost perfect, but upon the end of a sentence, the pitch should decrease, the rate should also decrease and a short pause is required before starting a new sentence.

    Motivation

    This is useful as soon as you have more than 2 sentences to synthetize. Else, the current, excellent quality of the STT engine is useless, since no human speaks continuously across sentences.

    enhancement 
    opened by X-Ryl669 9
  • Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Hi,

    I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

    Following error occurs for example.ipynb notebook:

    model_url = model_conf.get('package')
    
    model_dir = "downloaded_model"
    os.makedirs(model_dir, exist_ok=True)
    model_path = os.path.join(model_dir, os.path.basename(model_url))
    
    if not os.path.isfile(model_path):
        torch.hub.download_url_to_file(model_url,
                                       model_path,
                                       progress=True)
    
    imp = package.PackageImporter(model_path)
    model = imp.load_pickle("te_model", "model")
    example_texts = model.examples
    
    def apply_te(text, lan='en'):
        return model.enhance_text(text, lan)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /tmp/ipykernel_2498123/2005539933.py in <module>
         10                                    progress=True)
         11 
    ---> 12 imp = package.PackageImporter(model_path)
         13 model = imp.load_pickle("te_model", "model")
         14 example_texts = model.examples
    
    ~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
         59             self.filename = str(file_or_buffer)
         60             if not os.path.isdir(self.filename):
    ---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
         62             else:
         63                 self.zip_reader = MockZipReader(self.filename)
    
    RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version
    

    For any of the Google Colab notebooks, I get the following error when executing the very first cell:

         |████████████████████████████████| 74 kB 2.2 MB/s 
         |████████████████████████████████| 2.9 MB 11.8 MB/s 
         |████████████████████████████████| 112 kB 35.0 MB/s 
         |████████████████████████████████| 596 kB 46.5 MB/s 
      Building wheel for antlr4-python3-runtime (setup.py) ... done
    /content/silero-models
    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    <ipython-input-1-5d873de0231f> in <module>()
         16 from glob import glob
         17 from omegaconf import OmegaConf
    ---> 18 from utils import (init_jit_model, 
         19                    split_into_batches,
         20                    read_audio,
    
    5 frames
    /usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
        362 
        363         if handle is None:
    --> 364             self._handle = _dlopen(self._name, mode)
        365         else:
        366             self._handle = handle
    
    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
    

    Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

    Thanks!

    bug 
    opened by abhinavkulkarni 9
  • Bug report - running on ARM / RPI

    Bug report - running on ARM / RPI

    🐛 Bug

    I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

    The function used instead of torch stft

    def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

    stack traces

    File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

    Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

    Expected behavior

    Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

    Environment

    PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

    OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

    Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

    Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect

    bug 
    opened by Salim-alileche 9
  • Feature request - Offline use of model

    Feature request - Offline use of model

    At the moment it is nearly impossible to create a docker container that works offline (without internet access). Even if you include this line during docker build:

    RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

    During execution of the docker container (without internet) you load it locally:

    torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

    Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

    So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

    Any reasons against that?

    enhancement 
    opened by Phil1108 7
  • Issue getting from silero model tried for text enhancement

    Issue getting from silero model tried for text enhancement

    Issue

    File "<torch_package_104>.release_module.py", line 122, in enhance_text File "<torch_package_104>.release_module.py", line 101, in enhance_long_textblock File "<torch_package_104>.release_module.py", line 72, in enhance_textblock File "<torch_package_104>.release_module.py", line 165, in enhance_tokens IndexError: string index out of range

    Details

    I added punctuation to the text using Silero models over the PyTorch hub, and everything was going smoothly until the attached text example appeared. I have no idea why this is occurring. I'm using this model to add punctuation to transcripts that I collect from YouTube; some of them have a few missing punctuation marks (supplied by the video author), while others have no punctuation at all (auto-generated by youtube).

    Transcript throwing Error

    transcript 1: ""Hey there. How's it going everybody in this video? We'll be learning about python Data types and specifically We'll be learning about how to work with textual data and textual data in python are represented with strings So we currently have [opened] our intro pi file that we were working with in the last video Where we just printed out hello world and I'll go ahead and run this so that we can see that down here It does print out hello [world] [now] This line here is using the print function and we're passing this text value into that print function now if we wanted to create a Variable that holds that text value then we could say now I'll just get rid of this comment for now So if I wanted a variable to hold that value then I can just create a variable and we'll call that"

    transcript 2: "you're now ready to see how to go one layer of a convolution on your network let's go through the example you've seen in the previous video how to take a 3d volume and convolve it with say two different filters in order to get in this example two different 4x4 outputs so let's say convolving with the first filter gives this first 4x4 output and convolving with this second filter gives a different 4x4 output the final thing to turn this into a convolutional neural net layer is that for each of these we're going to add it bias so this is going to be a real number and what - broadcasting you kind of had the same number - every you know one of these sixteen elements and then apply a non-linearity which for illustration that says there a luna mini arity and this gives you a 4x4 output after applying the bias and the non-linearity and then for this thing at the bottom as well you had some different buyers again this is a real number so you had the same row number - all 16 numbers and then applies some non-linearity that fairly non-linearity and this gives you a different 4x4 output then same as we did before if you take this and stack it up as follows so they end up with a 4 by 4 by 2 output then this computation where you've gone from 6 by 6 by 3 to a 4 by 4 by 4 this is one layer of a convolutional neural network Center mapped is back to one layer of for propagation in the standard neural network when a non convolutional neural network remember that one step afford prot was something like this right z1 equals w1 times a0 a0 was also equal to X right and then plus b1 and he applied the non-linearity to get a 1 so that's G of Z 1" Please review the above transcript that is and let us know what the problem is.

    opened by Kishan-Sahu 6
  • Model getting stuck on some texts.

    Model getting stuck on some texts.

    There hasn't been a debugging message to explain why the model keeps getting stuck for a very long period. Please assist us in adding a debugging message to the model so we can identify the cause of the problem.

    The text for which the model stuck is given below:

    Text: "we're going to set this by saying export Python path all uppercase and then equals and now we want to set that location so I'm just going to come over here and grab that location and paste that in those quotes and we want it to look just like that no space in between the equals and the path so to save that we can just hit ctrl X and then Y to save and then enter to keep the same file name and now we can either restart our terminal or run a source command on that file but I'll just restart the terminal here and pull this up and now if we run Python then let's see if we can import that module so import my module and we can see that that worked and the reason that worked is that if we import sis and look at our sis then we can see that after our current directory that we have the directory that was added there and the reason that it's added is that we added it to our Python path environment variable so now let's take a look at how to set"

    I manually tested by eliminating strange letters and words and discovered that removing "ctrl" from text, worked effectively.

    opened by Kishan-Sahu 2
  • Feature request - `<phoneme>` support for SSML

    Feature request - `` support for SSML

    🚀 Feature

    Allow phonetic pronunciation for necessary words

    Motivation

    Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

    Pitch

    Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

    Alternatives

    Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

    Additional context

    enhancement 
    opened by lagleki 1
  • Packaging and PyPI releases

    Packaging and PyPI releases

    Hello,

    Thank you for your hard work.

    Is there any chance of getting installable Python package from PyPI for the project?

    For example, it might look like this for installing STT models with PyTorch:

    pip install silero-models-stt[torch]
    

    This would be very handy for using the models in the production projects and environments.

    help wanted 
    opened by espdev 9
  • Feature request - [Wake Word Detection]

    Feature request - [Wake Word Detection]

    🚀 Feature

    It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

    Motivation & Pitch

    Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

    Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

    Alternatives

    Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

    I do understand if this is too far outside of your scope for this project.

    enhancement 
    opened by waytotheweb 1
Releases(v0.4.1)
  • v0.4.1(Jun 12, 2022)

    What's Changed

    • Fix models.yml loading by @rominf in https://github.com/snakers4/silero-models/pull/162

    New Contributors

    • @rominf made their first contribution in https://github.com/snakers4/silero-models/pull/162

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.4...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4(Jun 6, 2022)

    What's Changed

    • Add version 3.1 by @Islanna in https://github.com/snakers4/silero-models/pull/157
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/158
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/159

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.3...v0.4

    Source code(tar.gz)
    Source code(zip)
  • v0.3(May 23, 2022)

    What's Changed

    • Testing the auto-build functionality
    • Update examples by @snakers4 in https://github.com/snakers4/silero-models/pull/137
    • Fx ssml and model loading by @Islanna in https://github.com/snakers4/silero-models/pull/140
    • Update README.md by @Islanna in https://github.com/snakers4/silero-models/pull/138
    • Tts v3 by @Islanna in https://github.com/snakers4/silero-models/pull/141

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.1...v0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Feb 28, 2022)

  • v1(Sep 16, 2020)

    header)

    Mailing list : test Mailing list : test License: CC BY-NC 4.0

    We publish the following models in this release:

    • English V1
    • German V1
    • Spanish V1

    | | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab |

    Source code(tar.gz)
    Source code(zip)
Owner
Alexander Veysov
Alexander Veysov
Pangu-Alpha for Transformers

Pangu-Alpha for Transformers Usage Download MindSpore FP32 weights for GPU from here to data/Pangu-alpha_2.6B.ckpt Activate MindSpore environment and

One 5 Oct 01, 2022
Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog This repository contains the dataset, source code and trained model for the following pa

172 Dec 13, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
A cross platform OCR Library based on PaddleOCR & OnnxRuntime

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

RapidOCR Team 767 Jan 09, 2023
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

3.2k Dec 30, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

Maksim Zhdanov 7 Sep 20, 2022
Russian GPT3 models.

Russian GPT-3 models (ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small) trained with 2048 sequence length with sparse and dense attention blocks. We also provide Russian GPT-2 large model (ruGPT2Larg

Sberbank AI 1.6k Jan 05, 2023
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

18 Nov 17, 2022
The SVO-Probes Dataset for Verb Understanding

The SVO-Probes Dataset for Verb Understanding This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object unders

DeepMind 20 Nov 30, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 09, 2021
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022
CoNLL-English NER Task (NER in English)

CoNLL-English NER Task en | ch Motivation Course Project review the pytorch framework and sequence-labeling task practice using the transformers of Hu

Kevin 2 Jan 14, 2022
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun

Yen-Chun Chen 680 Dec 24, 2022
Python api wrapper for JellyFish Lights

Python api wrapper for JellyFish Lights The hope is to make this a pip installable package Current capabalilities: Connects to a local JellyFish Light

10 Dec 18, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+-- 3D_structure -- 3D_module --+ +-- ? | |

Eric Alcaide 18 Jan 11, 2022
Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

Junbum Lee 12 Oct 26, 2022
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023