glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Overview

Glow-Speak

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Installation

git clone https://github.com/rhasspy/glow-speak.git
cd glow-speak/

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade setuptools wheel
pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -r requirements.txt

python3 setup.py develop
glow-speak --version

Voices

The following languages/voices are supported:

  • German
    • de_thorsten
  • Chinese
    • cmn_jing_li
  • Greek
    • el_rapunzelina
  • English
    • en-us_ljspeech
    • en-us_mary_ann
  • Spanish
    • es_tux
  • Finnish
    • fi_harri_tapani_ylilammi
  • French
    • fr_siwis
  • Hungarian
    • hu_diana_majlinger
  • Italian
    • it_riccardo_fasol
  • Korean
    • ko_kss
  • Dutch
    • nl_rdh
  • Russian
    • ru_nikolaev
  • Swedish
    • sv_talesyntese
  • Swahili
    • sw_biblia_takatifu
  • Vietnamese
    • vi_vais1000

Usage

Download Voices

glow-speak-download de_thorsten

Command-Line Synthesis

glow-speak -v en-us_mary_ann 'This is a test.' --output-file test.wav

HTTP Server

glow-speak-http-server --debug

Visit http://localhost:5002

Socket Server

Start the server:

glow-speak-socket-server --voice en-us_mary_ann --socket /tmp/glow-speak.sock

From a separate terminal:

echo 'This is a test.' | bin/glow-speak-socket-client --socket /tmp/glow-speak.sock | xargs aplay

Lines from client to server are synthesized, and the path to the WAV file is returned (usually in /tmp).

You might also like...
End-to-End Speech Processing Toolkit
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

 SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Comments
  • AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    Hi Micheal,

    great work again! :smiley:

    I just saw this repository and thought I'd give it a try on my freshly installed Raspberry Pi 4 with 32bit Raspberry Pi OS Bullseye (Debian 11). Installation almost finished without errors! :partying_face: ... I just had to fix one thing: sudo apt-get install libatlas-base-dev After 15min I was already generating audio :grin: :+1:

    When I tested en mary_ann and thorsten_de via the web interface I got this error as soon as my test sentence ended with a question mark:

    DEBUG:glow-speak:ɪ_z ð_ɪ_s ɐ_n_ˈʌ_ð_ɚ t_ˈɛ_s_t? .
    ERROR:glow_speak.http_server:
    Traceback (most recent call last):
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
        result = await self.dispatch_request(request_context)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
        return await self.ensure_async(handler)(**request_.view_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 484, in app_say
        wav_bytes = await text_to_wav(text, voice, **tts_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 323, in text_to_wav
        text_ids = text_to_ids(
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 110, in text_to_ids
        text_ids = phonemes2ids(
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 190, in phonemes2ids
        maybe_extend_ids(sub_phoneme, word_ids, append_list=False)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 108, in maybe_extend_ids
        maybe_ids = missing_func(phoneme)
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 59, in guess_ids
        typing.List[Phoneme], guess_phonemes(phoneme, self.to_phonemes)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/gruut_ipa/accent.py", line 159, in guess_phonemes
        assert dist_split is not None
    AssertionError
    

    Maybe some encoding error when reading the web input?

    Speed seems pretty good, comparable to Larynx I'd say :+1: and I noticed the pronunciations have been improved for German :clap: :sunglasses:

    opened by fquirin 0
Owner
Rhasspy
Offline voice assistant
Rhasspy
Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

Chua Chin Hon 30 Jul 13, 2022
Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

Epicalable 50 Dec 31, 2022
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

Subhadeep Mandal 1 Feb 01, 2022
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
Model parallel transformers in JAX and Haiku

Table of contents Mesh Transformer JAX Updates Pretrained Models GPT-J-6B Links Acknowledgments License Model Details Zero-Shot Evaluations Architectu

Ben Wang 4.9k Jan 04, 2023
A workshop with several modules to help learn Feast, an open-source feature store

Workshop: Learning Feast This workshop aims to teach users about Feast, an open-source feature store. We explain concepts & best practices by example,

Feast 52 Jan 05, 2023
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Graph Coloring - Weighted Vertex Coloring Problem

Graph Coloring - Weighted Vertex Coloring Problem This project proposes several local searches and an MCTS algorithm for the weighted vertex coloring

Cyril 1 Jul 08, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (S

InstaDeep Ltd 72 Dec 09, 2022
使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征,提升下游任务的表现。

Pretrain_Bert_with_MaskLM Info 使用Mask LM预训练任务来预训练Bert模型。 基于pytorch框架,训练关于垂直领域语料的预训练语言模型,目的是提升下游任务的表现。 Pretraining Task Mask Language Model,简称Mask LM,即

Desmond Ng 24 Dec 10, 2022
Official PyTorch implementation of SegFormer

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

NVIDIA Research Projects 1.4k Dec 29, 2022