Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS πŸ“’ πŸ€–

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°
     > Text splitted to sentences.
    ['ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Π‘ΠΎΠ±Π΅Ρ€ Π½Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· бобрСнятами Π±ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½Π° ΠΏΠΎ Π±ΠΎΡ€ΠΎΠ½ΠΎΠ²Π°Π½ΠΎΠΌΡƒ полю.
    
    Π†ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€Ρ–ΠΏ, як ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ–, Ρ‚Π°ΠΊ Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ– Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. Π―ΠΊ ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’Π°ΠΊ Ρ– Π±Π΅Π· ΠŸΡ€ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.
    

    Result:

    
    Π‘ΠΎΠ±+Π΅Ρ€ Π½+Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· Π±ΠΎΠ±Ρ€Π΅Π½+ятами Π±+ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½+ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½+Π° ΠΏ+ΠΎ Π±ΠΎΡ€ΠΎΠ½+ΠΎΠ²Π°Π½ΠΎΠΌΡƒ ΠΏ+олю.
    
    Π†Ρˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€+Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€+Ρ–ΠΏ, +як ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ–, Ρ‚+Π°ΠΊ +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ– +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄+ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–Ρˆ+ΠΎΠ² ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. +Π―ΠΊ ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’+Π°ΠΊ +Ρ– Π±+Π΅Π· ΠŸΡ€+ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

Resources for "Natural Language Processing" Coursera course.

Natural Language Processing course resources This github contains practical assignments for Natural Language Processing course by Higher School of Eco

Advanced Machine Learning specialisation by HSE 1.1k Jan 01, 2023
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 01, 2023
Reading Wikipedia to Answer Open-Domain Questions

DrQA This is a PyTorch implementation of the DrQA system described in the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions. Quick Link

Facebook Research 4.3k Jan 01, 2023
Application for shadowing Chinese.

chinese-shadowing Simple APP for shadowing chinese. With this application, it is very easy to record yourself, play the sound recorded and listen to s

Thomas Hirtz 5 Sep 06, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Samuel Sharkey 1 Feb 07, 2022
Levenshtein and Hamming distance computation

distance - Utilities for comparing sequences This package provides helpers for computing similarities between arbitrary sequences. Included metrics ar

112 Dec 22, 2022
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoenc

Venelin Valkov 1.8k Dec 31, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

Nathan Cooper 2.3k Jan 01, 2023
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Textpipe: clean and extract metadata from text

textpipe: clean and extract metadata from text textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata

Textpipe 298 Nov 21, 2022
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
πŸ¦† Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! πŸŽ‰ What is this? At this repo, I'm

M. Yusuf SarΔ±gΓΆz 13 Oct 10, 2022
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating PokΓ©mon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022