Ukrainian TTS (text-to-speech) using Coqui TTS

Last update: Dec 26, 2022

Overview

title	emoji	colorFrom	colorTo	sdk	app_file	pinned
Ukrainian TTS	🐸	green	green	gradio	app.py	false

Ukrainian TTS 📢 🤖

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

pip install -r requirements.txt.
Download model from "Releases" tab.
Launch as one-time command:

tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

Refer to "Nervous beginner guide" in Coqui TTS docs.
Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments

Error with file: speakers.pth

FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

opened by akirsoft 4

doc: fix examples in README

Problem

The one-time snippet does not work as is and complains that the speaker is not defined

 > initialization of speaker-embedding layers.
 > Text: Перевірка мікрофона
 > Text splitted to sentences.
['Перевірка мікрофона']
Traceback (most recent call last):
  File "/home/serg/.local/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
    wav = synthesizer.tts(
  File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
    raise ValueError(
ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.

Also it speakers.pth should be downloaded.

Fix

Just a few documentation changes:

make instructions on what to download from Releases more precise
add --speaker_id argument with one of the speakers

opened by seriar 2

One vowel words in the end of the sentence aren't stressed

Input:


Бобер на березі з бобренятами бублики пік.

Боронила борона по боронованому полю.

Ішов Прокіп, кипів окріп, прийшов Прокіп - кипить окріп, як при Прокопі, так і при Прокопі і при Прокопенятах.

Сидить Прокоп — кипить окроп, Пішов Прокоп — кипить окроп. Як при Прокопові кипів окроп, Так і без Прокопа кипить окроп.

Result:


Боб+ер н+а березі з бобрен+ятами б+ублики пік.

Борон+ила борон+а п+о борон+ованому п+олю.

Іш+ов Пр+окіп, кип+ів окр+іп, прийш+ов Пр+окіп - кип+ить окр+іп, +як пр+и Пр+окопі, т+ак +і пр+и Пр+окопі +і пр+и Прокопенятах.

Сид+ить Прок+оп — кип+ить окроп, Піш+ов Прок+оп — кип+ить окроп. +Як пр+и Пр+окопові кип+ів окроп, Т+ак +і б+ез Пр+окопа кип+ить окроп.```

opened by robinhad 0

Error import StressOption

Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

opened by akirsoft 0

Vits improvements

vitsArgs = VitsArgs(
    # hifi V3
    resblock_type_decoder = '2',
    upsample_rates_decoder = [8,8,4],
    upsample_kernel_sizes_decoder = [16,16,8],
    upsample_initial_channel_decoder = 256,
    resblock_kernel_sizes_decoder = [3,5,7],
    resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
)

opened by robinhad 0

Model improvement checklist
[x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor

[ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)

[ ] Try to increase fft_size, hop_length to match sample_rate accordingly

[ ] Include more dataset samples into model
opened by robinhad 0

Releases(v4.0.0)

v4.0.0(Dec 10, 2022)

This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 160 000 steps by @robinhad .
Source code(tar.gz)
Source code(zip)
config.yaml(7.88 KB)
model.pth(368.17 MB)
v3.0.0(Sep 14, 2022)
This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

Example:

Test sentence:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Mykyta (male):

https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

Olena (female):

https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

Dmytro (male):

https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

Olha (female):

https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

Lada (female):

https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4
Source code(tar.gz)
Source code(zip)
config.json(12.07 KB)
model-inference.pth(329.95 MB)
model.pth(989.97 MB)
speakers.pth(495 bytes)
v3.0.0-alpha(Sep 8, 2022)

Mykyta, Lada and Olena License: GPL v3 licence.
Source code(tar.gz)
Source code(zip)
config.json(9.94 KB)
model-inference.pth(329.95 MB)
model.pth(989.96 MB)
speakers.pth(431 bytes)
v2.0.0(Jul 10, 2022)
This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

Example:

Test sentence:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Mykyta (male):

https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

Olena (female):

https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4
Source code(tar.gz)
Source code(zip)
config.json(9.97 KB)
model-inference.pth(329.95 MB)
model.pth(989.72 MB)
optimized.pth(329.95 MB)
speakers.pth(431 bytes)
v2.0.0-beta(May 8, 2022)

This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

Example:

https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

:
Source code(tar.gz)
Source code(zip)
config.json(8.85 KB)
LICENSE(34.32 KB)
model-inference.pth(317.15 MB)
model.pth(951.32 MB)
tts_output.wav(1.11 MB)
v1.0.0(Jan 14, 2022)

This is release of Ukrainian TTS model and checkpoint using voice (10 hours) from M-AILABS Ukrainian dataset. Model was trained using 200 000 steps. Example:

https://user-images.githubusercontent.com/5759207/149566245-40656002-3999-48a8-b671-e0f74c3d6e2f.mp4
Source code(tar.gz)
Source code(zip)
config.json(6.44 KB)
example.mp4(32.07 KB)
v0.0.1(Oct 14, 2021)

This is release of Ukrainian TTS model and checkpoint using voice (10 hours) from M-AILABS Ukrainian dataset. Model was trained using 14145 steps. Example:

https://user-images.githubusercontent.com/5759207/140622395-9e734c95-159c-4d72-9f56-e8d1f1ac66c2.mp4
Source code(tar.gz)
Source code(zip)
config.json(4.96 KB)
test.mp4(32.73 KB)
vocoder_config.json(4.39 KB)

Owner

Yurii Paniv

AI and stuff

GitHub Repository https://huggingface.co/spaces/robinhad/ukrainian-tts

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

11 Feb 18, 2022

This is a MD5 password/passphrase brute force tool

CROWES-PASS-CRACK-TOOl This is a MD5 password/passphrase brute force tool How to install: Do 'git clone https://github.com/CROW31/CROWES-PASS-CRACK-TO

9 Mar 02, 2022

Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

5.3k Jan 01, 2023

本插件是pcrjjc插件的重置版，可以独立于后端api运行

pcrjjc2 本插件是pcrjjc重置版，不需要使用其他后端api，但是需要自行配置客户端本项目基于AGPL v3协议开源，由于项目特殊性，禁止基于本项目的任何商业行为配置方法环境需求：.net framework 4.5及以上 jre8 别忘了装jre8 别忘了装jre8 别忘了装jre8

132 Dec 26, 2022

txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

49 Dec 06, 2022

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

151 Jan 05, 2023

硕士期间自学的NLP子任务，供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务，供学习参考任务1 ：短文本分类 (1).数据集：THUCNews中文文本数据集(10分类) (2).模型：BERT+FC/LSTM，Pytorch实现 (3).使用方法：预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

3.4k Dec 27, 2022

AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

819 Dec 30, 2022

Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

364 Jan 03, 2023

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

13.2k Jul 07, 2021

Tools for curating biomedical training data for large-scale language modeling

242 Dec 25, 2022

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

🤗 🖼️ HuggingPics Fine-tune Vision Transformers for anything using images found on the web. Check out the video below for a walkthrough of this proje

185 Dec 21, 2022

🏆 • 5050 most frequent words in 109 languages

🏆 Most Common Words Multilingual 5000 most frequent words in 109 languages. Uses wordfrequency.info as a source. 🔗 License source code license data

14 Nov 24, 2022

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

1 Nov 16, 2021

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Introduction XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective.

6k Jan 07, 2023

NL. The natural language programming language.

NL A Natural-Language programming language. Built using Codex. A few examples are inside the nl_projects directory. How it works Write any code in pur

2 Jan 17, 2022

A paper list for aspect based sentiment analysis.

Aspect-Based-Sentiment-Analysis A paper list for aspect based sentiment analysis. Survey [IEEE-TAC-20]: Issues and Challenges of Aspect-based Sentimen

419 Dec 20, 2022

Ukrainian TTS (text-to-speech) using Coqui TTS

Related tags

Overview

Ukrainian TTS 📢 🤖

Support

Example

How to use :

How to train:

Attribution

Comments

Error with file: speakers.pth

doc: fix examples in README

Problem

Fix

One vowel words in the end of the sentence aren't stressed

Error import StressOption

Vits improvements

Model improvement checklist

Releases(v4.0.0)

v4.0.0(Dec 10, 2022)

v3.0.0(Sep 14, 2022)

Example:

v3.0.0-alpha(Sep 8, 2022)

v2.0.0(Jul 10, 2022)

Example:

v2.0.0-beta(May 8, 2022)

Example:

v1.0.0(Jan 14, 2022)

v0.0.1(Oct 14, 2021)

Owner

Yurii Paniv

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

This is a MD5 password/passphrase brute force tool

Extract Keywords from sentence or Replace keywords in sentences.

本插件是pcrjjc插件的重置版，可以独立于后端api运行

txtai: Build AI-powered semantic search applications in Go

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

硕士期间自学的NLP子任务，供学习参考

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

AI-powered literature discovery and review engine for medical/scientific papers

Simple program that translates the name of files into English

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Tools for curating biomedical training data for large-scale language modeling

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

🏆 • 5050 most frequent words in 109 languages

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

NL. The natural language programming language.

A paper list for aspect based sentiment analysis.