German Text-To-Speech Engine using Tacotron and Griffin-Lim

Last update: Aug 28, 2022

Related tags

Overview

jotts

JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due to real time usage I decided not to include a vocoder and use griffin-lim instead which results in a more robotic voice but is much faster.

API

First create an instance of JoTTS. The initializer takes force_model_download as an optional parameter in case that the last download of the synthesizer failed and the model cannot be applied.
Call speak with a text parameter that contains the text to speak out loud. The second parameter can be set to True, to wait until speaking is done.
Use text2wav to create a wav file instead of speaking the text.

Example usage

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

Todo

Add an option to change the default audio device to speak the text
Add a parameter to select other models but the default model
Add threading or multi processing to allow speaking without blocking
Add a vocoder instead of griffin-lim to improve audio output.

Training a model for your own voice

Training a synthesizer model is easy - if you know how to do it. I created a course on udemy to show you how it is done. Don't buy the tutorial for the full price, there is a discout every month :-)

https://www.udemy.com/course/voice-cloning/

If you neither have the backgroud or the resources or if you are just lazy or too rich, contact me for contract work. Cloning a voice normally needs ~15 Minutes of clean audio from the voice you want to clone.

Disclaimer

I hope that my (and any other person's) voice will be used only for legal and ethical purposes. Please do not get into mischief with it.

Comments

SSL: CERTIFICATE_VERIFY_FAILED

my code is

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.textToWav("Es war aber auch schon mal besser!")

and I receive this :

2022-11-01 09:39:57.536 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:50 - There is no tts model yet, downloading...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:60 - Download file: https://github.com/padmalcom/jotts/releases/download/v0.1/v0.1.pt
v0.1.pt: 0.00B [00:00, ?B/s]

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    jotts = JoTTS()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 68, in __init__
    MODEL_FILE = self.__prepare_model__(force_model_download);
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 62, in __prepare_model__
    urllib.request.urlretrieve(DOWNLOAD_URL, filename=MODEL_FILE, reporthook=t.update_to)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

what am I doing wrong. ? Thanks !

opened by deladriere 3

Samples of jotts in combination with a modern vocoder like (MB)Melgan, HifiGAN

I tried to drop a spectrogram sanmple as npy and feed HifiGAN but it gave me a lot of noise. I am wondering how good your results are, do you have samples with vocoders like above?

opened by eqikkwkp25-cyber 2

jotts.text2wav not existing / needs jotts.textToWav

running this example on MacOS 11.6

from jotts import JoTTS

jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.speak("Wir sind Die Roboter.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

give an error trying to generate the wav file (The speak function works really well !)

2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.

| Generating 1/1


Done.

Traceback (most recent call last):
  File "test_jotts.py", line 6, in <module>
    jotts.text2wav("Es war aber auch schon mal besser!")
AttributeError: 'JoTTS' object has no attribute 'text2wav'

using jotts.textToWav works well but there is still this [W NNPACK.cpp:79] message here is the output

2021-12-14 17:45:31.699 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.


| Generating 1/1


Done.


| Generating 1/1


Done.

opened by deladriere 2

can this run on a Rapsberry Pi Zero ?

Sorry not an issue but I would like to have a Raspberry Pi Zero speak German without the need for an Internet connection (Amazon Polly and IBM Watson have great German voices but are paid service quite complex to install - not to mention the need for a connect and its delays) I just subscribed to your course (I understand only a bit of German) ;-) Maybe some of the heavy work can be done on a fast computer but I need the text to speech to be done on the Raspberry Pi ?

opened by deladriere 2
Missing additional information in README

Typo somewhere: The readme says "The synthesizer model has been trained on my voice using Tacotron1." while the releases say "v0.1 Latest Pre-trained German synthesizer model based on tacotron2."

Can you add more hints how you trained your model(s), i.e. which base repository, data structure and how many hours of your voice you need for the current results?

opened by eqikkwkp25-cyber 1

Releases(generic_v0.4)

generic_v0.4(Dec 30, 2022)

Trained for 98k steps on german common voice dataset.
Source code(tar.gz)
Source code(zip)
generic_v0.4.pt(353.51 MB)
vocoder_v0.1(Nov 8, 2022)

WaveRNN vocoder trained for 142.000 steps. Can be used instead of griffin-lim algorithm, might deliver better results but requires more ressources to apply.
Source code(tar.gz)
Source code(zip)
vocoder_v0.1.pt(51.40 MB)
jonas_v0.1(Nov 22, 2021)

Pre-trained German synthesizer model based on tacotron.
Source code(tar.gz)
Source code(zip)
jonas_v0.1.pt(353.49 MB)
generic_v0.3(Oct 27, 2022)

Trained for 75k steps on high quality voice.
Source code(tar.gz)
Source code(zip)
generic_v0.3.pt(353.49 MB)

Owner

padmalcom

PhD in Computer Science, interested in machine learning, game programming and robotics. Hope my projects help somewhere.

GitHub Repository

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

141 Dec 30, 2022

A telegram bot to translate 100+ Languages

🔥 GOOGLE TRANSLATER 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🔰 Deploy To Railway 🔰 • • ✅ OFF

5 Dec 20, 2021

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

138 Dec 30, 2022

Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

4 Feb 18, 2022

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

394 Dec 17, 2022

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

478 Dec 25, 2022

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022

Associated Repository for "Translation between Molecules and Natural Language"

MolT5: Translation between Molecules and Natural Language Associated repository for "Translation between Molecules and Natural Language". Table of Con

67 Dec 15, 2022

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力，旨在为飞桨开发者提升文本建模效率，并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023

Just a basic Telegram AI chat bot written in Python using Pyrogram.

Nikko ChatBot Just a basic Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher. A bot token. Installation $ https

2 Oct 21, 2022

SimBERT升级版（SimBERTv2）！

RoFormer-Sim RoFormer-Sim，又称SimBERTv2，是我们之前发布的SimBERT模型的升级版。介绍 https://kexue.fm/archives/8454 训练 tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.6 下载

317 Dec 23, 2022

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Parser-Free Virtual Try-on via Distilling Appearance Flows, CVPR 2021 Official code for CVPR 2021 paper 'Parser-Free Virtual Try-on via Distilling App

395 Jan 03, 2023

NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

5.3k Jan 04, 2023

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

543 Jan 08, 2023

skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels wi

850 Dec 28, 2022

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

9 Nov 17, 2022