Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS 📢 🤖

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: Перевірка мікрофона
     > Text splitted to sentences.
    ['Перевірка мікрофона']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Бобер на березі з бобренятами бублики пік.
    
    Боронила борона по боронованому полю.
    
    Ішов Прокіп, кипів окріп, прийшов Прокіп - кипить окріп, як при Прокопі, так і при Прокопі і при Прокопенятах.
    
    Сидить Прокоп — кипить окроп, Пішов Прокоп — кипить окроп. Як при Прокопові кипів окроп, Так і без Прокопа кипить окроп.
    

    Result:

    
    Боб+ер н+а березі з бобрен+ятами б+ублики пік.
    
    Борон+ила борон+а п+о борон+ованому п+олю.
    
    Іш+ов Пр+окіп, кип+ів окр+іп, прийш+ов Пр+окіп - кип+ить окр+іп, +як пр+и Пр+окопі, т+ак +і пр+и Пр+окопі +і пр+и Прокопенятах.
    
    Сид+ить Прок+оп — кип+ить окроп, Піш+ов Прок+оп — кип+ить окроп. +Як пр+и Пр+окопові кип+ів окроп, Т+ак +і б+ез Пр+окопа кип+ить окроп.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

Conversational text Analysis using various NLP techniques

Conversational text Analysis using various NLP techniques

Rita Anjana 159 Jan 06, 2023
An open-source NLP library: fast text cleaning and preprocessing.

An open-source NLP library: fast text cleaning and preprocessing

Iaroslav 21 Mar 18, 2022
TruthfulQA: Measuring How Models Imitate Human Falsehoods

TruthfulQA: Measuring How Models Imitate Human Falsehoods

69 Dec 25, 2022
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

Tower 1 Nov 20, 2021
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+-- 3D_structure -- 3D_module --+ +-- ? | |

Eric Alcaide 18 Jan 11, 2022
基于Transformer的单模型、多尺度的VAE模型

UniVAE 基于Transformer的单模型、多尺度的VAE模型 介绍 https://kexue.fm/archives/8475 依赖 需要大于0.10.6版本的bert4keras(当前还没有推到pypi上,可以直接从GitHub上clone最新版)。 引用 @misc{univae,

苏剑林(Jianlin Su) 49 Aug 24, 2022
使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征,提升下游任务的表现。

Pretrain_Bert_with_MaskLM Info 使用Mask LM预训练任务来预训练Bert模型。 基于pytorch框架,训练关于垂直领域语料的预训练语言模型,目的是提升下游任务的表现。 Pretraining Task Mask Language Model,简称Mask LM,即

Desmond Ng 24 Dec 10, 2022
ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

LEE YOON HYUNG 147 Dec 05, 2022
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
A python package for deep multilingual punctuation prediction.

This python library predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language.

Oliver Guhr 27 Dec 22, 2022
It analyze the sentiment of the user, whether it is postive or negative.

Sentiment-Analyzer-Tool It analyze the sentiment of the user, whether it is postive or negative. It uses streamlit library for creating this sentiment

Paras Patidar 18 Dec 17, 2022
Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

Forges Alterway 8 Jan 06, 2023
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 09, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

David Bamman 48 Dec 06, 2022
Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

European Nopsled 4 Nov 24, 2021
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022