Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Last update: Dec 31, 2022

Overview

Silero Models

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

No Kaldi;
No compilation;
No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

One-line usage;
A large library of voices;
A fully end-to-end pipeline;
Naturally sounding speech;
No GPU or training required;
Minimalism and lack of dependencies;
Faster than real-time on one CPU thread (!!!);
Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

	PyTorch	ONNX	Quantization	Quality
English (`en_v5`)	✔️	✔️	✔️	link
German (`de_v4`)	✔️	✔️	⌛	link
English (`en_v3`)	✔️	✔️	✔️	link
German (`de_v3`)	✔️	⌛	⌛	link
German (`de_v1`)	✔️	✔️	⌛	link
Spanish (`es_v1`)	✔️	✔️	⌛	link
Ukrainian (`ua_v3`)	✔️	✔️	✔️	N/A

Model flavours:

	jit	jit	jit	jit	jit_q	jit_q	onnx	onnx	onnx	onnx
	xsmall	small	large	xlarge	xsmall	small	xsmall	small	large	xlarge
English `en_v5`		✔️		✔️		✔️		✔️		✔️
English `en_v4_0`			✔️						✔️
English `en_v3`	✔️	✔️	✔️		✔️	✔️	✔️	✔️	✔️
German `de_v4`			✔️						✔️
German `de_v3`			✔️
German `de_v1`		✔️					✔️
Spanish `es_v1`		✔️					✔️
Ukrainian `ua_v3`		✔️			✔️		✔️

Dependencies

All examples:
- torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
- torchaudio, latest version bound to PyTorch should work
- omegaconf, latest just should work
Additional for ONNX examples:
- onnx, latest just should work
- onnxruntime, latest just should work
Additional for TensorFlow examples:
- tensorflow, latest just should work
- tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker	Auto-stress	Language	SR
`aidar_v2`	yes	`ru` (Russian)	`8000`, `16000`
`baya_v2`	yes	`ru` (Russian)	`8000`, `16000`
`irina_v2`	yes	`ru` (Russian)	`8000`, `16000`
`kseniya_v2`	yes	`ru` (Russian)	`8000`, `16000`
`natasha_v2`	yes	`ru` (Russian)	`8000`, `16000`
`ruslan_v2`	yes	`ru` (Russian)	`8000`, `16000`
`lj_v2`	no	`en` (English)	`8000`, `16000`
`thorsten_v2`	no	`de` (German)	`8000`, `16000`
`tux_v2`	no	`es` (Spanish)	`8000`, `16000`
`gilles_v2`	no	`fr` (French)	`8000`, `16000`
`multi_v2`	no	`ru`, `en`, `de`, `es`, `fr`, `tt`	`8000`, `16000`
`aigul_v2`	no	`ba` (Bashkir)	`8000`, `16000`
`erdni_v2`	no	`xal` (Kalmyk)	`8000`, `16000`
`dilyara_v2`	no	`tt` (Tatar)	`8000`, `16000`
`dilnavoz_v2`	no	`uz` (Uzbek)	`8000`, `16000`

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

torch, 1.9+;
torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

Standalone usage just requires PyTorch 1.9+ and python standard library;
Please see the detailed examples in Colab;

import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Donations

Please use the "sponsor" button.

Comments

Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!
enhancement help wanted

opened by aleks73337 28
README's Standalone Use misses to mention NumPy
Currently, https://github.com/snakers4/silero-models#standalone-use states that:

Standalone usage just requires PyTorch 1.10+ and python standard library;

but I had to install NumPy as well to make the example work.
bug
opened by ghost 23
Bug report - RuntimeError: Unknown qengine

Hello. Great project! I would like to test a standard example, but at the line: model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") I get an error: \lib\site-packages\torch\jit_script.py", line 351, in unpackage_script_module cpp_module = torch._C._import_ir_module_from_package( untimeError: Unknown qengine

Python 10.4 , Torch 11.0 , device='cpu', Windows 10 Model: torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/ru_v3.pt', local_file)
Tell me, please, how to fix it?
bug

opened by lik2129 14
Bug report -problem loading STT model on Windows
Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

code:

import torch import zipfile import torchaudio from glob import glob device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', # also available 'de', 'es' device=device)

Error: RuntimeError Traceback (most recent call last) C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in 1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU ----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', 3 model='silero_stt', 4 language='en', # also available 'de', 'es' 5 device=device)

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs) 397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) 398 --> 399 model = _load_local(repo_or_dir, model, *args, **kwargs) 400 return model 401

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs) 426 427 entry = _load_entry_from_hubconf(hub_module, model) --> 428 model = entry(*args, **kwargs) 429 430 sys.path.remove(hubconf_dir)

~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs) 32 assert language in available_languages 33 ---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model), 35 **kwargs) 36 utils = (read_batch,

~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device) 128 progress=True) 129 --> 130 model = torch.jit.load(model_path, map_location=device) 131 model.eval() 132 return model, Decoder(model.labels)

c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit
bug
opened by lev007-ops 13
Feature request - SAPI5

SAPI5 compatibility

🚀 Feature

Motivation

Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!
enhancement

opened by studennikov-serg 11
How to obtain an intermediate layer output?

How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.
help wanted

opened by prajwalkr 11
Feature request - Expressiveness

🚀 Feature

Right now, in French STT, there is no decay upon a end of sentence. So if you have 2 sentences, the prosody is wrong and painful to hear. Each sentence by itself is almost perfect, but upon the end of a sentence, the pitch should decrease, the rate should also decrease and a short pause is required before starting a new sentence.

Motivation

This is useful as soon as you have more than 2 sentences to synthetize. Else, the current, excellent quality of the STT engine is useless, since no human speaks continuously across sentences.
enhancement

opened by X-Ryl669 9

Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

Hi,

I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

Following error occurs for example.ipynb notebook:

model_url = model_conf.get('package')

model_dir = "downloaded_model"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, os.path.basename(model_url))

if not os.path.isfile(model_path):
    torch.hub.download_url_to_file(model_url,
                                   model_path,
                                   progress=True)

imp = package.PackageImporter(model_path)
model = imp.load_pickle("te_model", "model")
example_texts = model.examples

def apply_te(text, lan='en'):
    return model.enhance_text(text, lan)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_2498123/2005539933.py in <module>
     10                                    progress=True)
     11 
---> 12 imp = package.PackageImporter(model_path)
     13 model = imp.load_pickle("te_model", "model")
     14 example_texts = model.examples

~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
     59             self.filename = str(file_or_buffer)
     60             if not os.path.isdir(self.filename):
---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
     62             else:
     63                 self.zip_reader = MockZipReader(self.filename)

RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version

For any of the Google Colab notebooks, I get the following error when executing the very first cell:

     |████████████████████████████████| 74 kB 2.2 MB/s 
     |████████████████████████████████| 2.9 MB 11.8 MB/s 
     |████████████████████████████████| 112 kB 35.0 MB/s 
     |████████████████████████████████| 596 kB 46.5 MB/s 
  Building wheel for antlr4-python3-runtime (setup.py) ... done
/content/silero-models
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-5d873de0231f> in <module>()
     16 from glob import glob
     17 from omegaconf import OmegaConf
---> 18 from utils import (init_jit_model, 
     19                    split_into_batches,
     20                    read_audio,

5 frames
/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    362 
    363         if handle is None:
--> 364             self._handle = _dlopen(self._name, mode)
    365         else:
    366             self._handle = handle

OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

Thanks!

bug

opened by abhinavkulkarni 9

Bug report - running on ARM / RPI

🐛 Bug

I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

The function used instead of torch stft

def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

stack traces

File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

Expected behavior

Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

Environment

PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect
bug

opened by Salim-alileche 9
Feature request - Offline use of model

At the moment it is nearly impossible to create a docker container that works offline (without internet access). Even if you include this line during docker build:

RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

During execution of the docker container (without internet) you load it locally:

torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

Any reasons against that?
enhancement

opened by Phil1108 7
Issue getting from silero model tried for text enhancement

Issue

File "<torch_package_104>.release_module.py", line 122, in enhance_text File "<torch_package_104>.release_module.py", line 101, in enhance_long_textblock File "<torch_package_104>.release_module.py", line 72, in enhance_textblock File "<torch_package_104>.release_module.py", line 165, in enhance_tokens IndexError: string index out of range

Details

I added punctuation to the text using Silero models over the PyTorch hub, and everything was going smoothly until the attached text example appeared. I have no idea why this is occurring. I'm using this model to add punctuation to transcripts that I collect from YouTube; some of them have a few missing punctuation marks (supplied by the video author), while others have no punctuation at all (auto-generated by youtube).

Transcript throwing Error

transcript 1: ""Hey there. How's it going everybody in this video? We'll be learning about python Data types and specifically We'll be learning about how to work with textual data and textual data in python are represented with strings So we currently have [opened] our intro pi file that we were working with in the last video Where we just printed out hello world and I'll go ahead and run this so that we can see that down here It does print out hello [world] [now] This line here is using the print function and we're passing this text value into that print function now if we wanted to create a Variable that holds that text value then we could say now I'll just get rid of this comment for now So if I wanted a variable to hold that value then I can just create a variable and we'll call that"

transcript 2: "you're now ready to see how to go one layer of a convolution on your network let's go through the example you've seen in the previous video how to take a 3d volume and convolve it with say two different filters in order to get in this example two different 4x4 outputs so let's say convolving with the first filter gives this first 4x4 output and convolving with this second filter gives a different 4x4 output the final thing to turn this into a convolutional neural net layer is that for each of these we're going to add it bias so this is going to be a real number and what - broadcasting you kind of had the same number - every you know one of these sixteen elements and then apply a non-linearity which for illustration that says there a luna mini arity and this gives you a 4x4 output after applying the bias and the non-linearity and then for this thing at the bottom as well you had some different buyers again this is a real number so you had the same row number - all 16 numbers and then applies some non-linearity that fairly non-linearity and this gives you a different 4x4 output then same as we did before if you take this and stack it up as follows so they end up with a 4 by 4 by 2 output then this computation where you've gone from 6 by 6 by 3 to a 4 by 4 by 4 this is one layer of a convolutional neural network Center mapped is back to one layer of for propagation in the standard neural network when a non convolutional neural network remember that one step afford prot was something like this right z1 equals w1 times a0 a0 was also equal to X right and then plus b1 and he applied the non-linearity to get a 1 so that's G of Z 1" Please review the above transcript that is and let us know what the problem is.

opened by Kishan-Sahu 6
Model getting stuck on some texts.

There hasn't been a debugging message to explain why the model keeps getting stuck for a very long period. Please assist us in adding a debugging message to the model so we can identify the cause of the problem.

The text for which the model stuck is given below:

Text: "we're going to set this by saying export Python path all uppercase and then equals and now we want to set that location so I'm just going to come over here and grab that location and paste that in those quotes and we want it to look just like that no space in between the equals and the path so to save that we can just hit ctrl X and then Y to save and then enter to keep the same file name and now we can either restart our terminal or run a source command on that file but I'll just restart the terminal here and pull this up and now if we run Python then let's see if we can import that module so import my module and we can see that that worked and the reason that worked is that if we import sis and look at our sis then we can see that after our current directory that we have the directory that was added there and the reason that it's added is that we added it to our Python path environment variable so now let's take a look at how to set"

I manually tested by eliminating strange letters and words and discovered that removing "ctrl" from text, worked effectively.

opened by Kishan-Sahu 2
Feature request - `` support for SSML

🚀 Feature

Allow phonetic pronunciation for necessary words

Motivation

Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

Pitch

Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

Alternatives

Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

Additional context
enhancement

opened by lagleki 1
Packaging and PyPI releases
Hello,

Thank you for your hard work.

Is there any chance of getting installable Python package from PyPI for the project?

For example, it might look like this for installing STT models with PyTorch:

pip install silero-models-stt[torch]

This would be very handy for using the models in the production projects and environments.
help wanted
opened by espdev 9
Feature request - [Wake Word Detection]

🚀 Feature

It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

Motivation & Pitch

Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

Alternatives

Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

I do understand if this is too far outside of your scope for this project.
enhancement

opened by waytotheweb 1

Releases(v0.4.1)

v0.4.1(Jun 12, 2022)
What's Changed

Fix models.yml loading by @rominf in https://github.com/snakers4/silero-models/pull/162

New Contributors

@rominf made their first contribution in https://github.com/snakers4/silero-models/pull/162

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.4...v0.4.1
Source code(tar.gz)
Source code(zip)
v0.4(Jun 6, 2022)
What's Changed

Add version 3.1 by @Islanna in https://github.com/snakers4/silero-models/pull/157

Fx by @Islanna in https://github.com/snakers4/silero-models/pull/158

Fx by @Islanna in https://github.com/snakers4/silero-models/pull/159

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.3...v0.4
Source code(tar.gz)
Source code(zip)
v0.3(May 23, 2022)
What's Changed

Testing the auto-build functionality

Update examples by @snakers4 in https://github.com/snakers4/silero-models/pull/137

Fx ssml and model loading by @Islanna in https://github.com/snakers4/silero-models/pull/140

Update README.md by @Islanna in https://github.com/snakers4/silero-models/pull/138

Tts v3 by @Islanna in https://github.com/snakers4/silero-models/pull/141

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.1...v0.2
Source code(tar.gz)
Source code(zip)
v0.1(Feb 28, 2022)

This is a test release to test Github Action based PyPI publishing. Some proper semantic version will be created later.
Source code(tar.gz)
Source code(zip)
v1(Sep 16, 2020)
We publish the following models in this release:

English V1

German V1

Spanish V1

| | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | |
Source code(tar.gz)
Source code(zip)

Owner

Alexander Veysov

GitHub Repository

Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

6 Dec 20, 2022

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

6k Dec 31, 2022

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Pattern-Exploiting Training (PET) This repository contains the code for Exploiting Cloze Questions for Few-Shot Text Classification and Natural Langua

1.4k Dec 30, 2022

The training code for the 4th place model at MDX 2021 leaderboard A.

32 Dec 18, 2022

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

2.1k Jan 01, 2023

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.9k Jan 02, 2023

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

2 Oct 22, 2022

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Welcome to AdaptNLP A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models

407 Jan 03, 2023

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022

ConvBERT-Prod

ConvBERT 目录 0. 仓库结构 1. 简介 2. 数据集和复现精度 3. 准备数据与环境 3.1 准备环境 3.2 准备数据 3.3 准备模型 4. 开始使用 4.1 模型训练 4.2 模型评估 4.3 模型预测 5. 模型推理部署 5.1 基于Inference的推理 5.2 基于Serv

7 Apr 08, 2022

Lyrics generation with GPT2-based Transformer

HuggingArtists - Train a model to generate lyrics Create AI-Artist in just 5 minutes! 🚀 Run the demo notebook to train 🚀 Run the GUI demo to test Di

65 Dec 19, 2022

Natural Language Processing

NLP Natural Language Processing apps Multilingual_NLP.py start #This script is demonstartion of Mul

1 Oct 31, 2021

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

4 Apr 06, 2022

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Related tags

Overview

Silero Models

Speech-To-Text

Dependencies

PyTorch

ONNX

TensorFlow

Text-To-Speech

Models and Speakers

Dependencies

PyTorch

Standalone Use

FAQ

Wiki

Performance and Quality

Adding new Languages

Contact

Get in Touch

Commercial Inquiries

Citations

Further reading

English

Chinese

Russian

Donations

Comments

🚀 Feature

Motivation

🚀 Feature

Motivation

🐛 Bug

The function used instead of torch stft

stack traces

Expected behavior

Environment

Issue

Details

Transcript throwing Error

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

🚀 Feature

Motivation & Pitch

Alternatives

Releases(v0.4.1)

v0.4.1(Jun 12, 2022)

What's Changed

New Contributors

v0.4(Jun 6, 2022)

What's Changed

v0.3(May 23, 2022)

What's Changed

v0.1(Feb 28, 2022)

v1(Sep 16, 2020)

Owner

Alexander Veysov

Constituency Tree Labeling Tool

An open source library for deep learning end-to-end dialog systems and chatbots.

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

The training code for the 4th place model at MDX 2021 leaderboard A.

Basic Utilities for PyTorch Natural Language Processing (NLP)

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

ConvBERT-Prod

Lyrics generation with GPT2-based Transformer

Natural Language Processing

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

An implementation of WaveNet with fast generation

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Python library for interactive topic model visualization. Port of the R LDAvis package.

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

👑 spaCy building blocks and visualizers for Streamlit apps

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.