Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Overview

Google Text-To-Speech Batch Prompt File Maker

forthebadge forthebadge

Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pro! This repository contains a tool for generating Google Text-To-Speech audio files in batch. It is ideal for offline prompts creation with Google voices for application in IVRs

In order to use this repository, clone the contents in your local environment with the following console command:

git clone https://github.com/ponchotitlan/google_text-to-speech_prompt_maker.git

Once cloned, follow the next steps for environment setup:

1) GCP account setup

Before adjusting up the contents of this project, it is neccesary to setup the Cloud Text-to-Speech API in your Google Cloud project:

  1. Follow the official documentation for activating this API and creating a Service Account
  2. Generate a JSON key associated to this Service Account
  3. Save this JSON key file in the same location as the contents of this repository

2) CSV and YAML files

Prepare a CSV document with the texts that you want to convert into prompt audio files. The CSV must have the following structure:

    <FILE NAME WITHOUT THE EXTENSION> , <PROMPT TEXT OR COMPLIANT SSML GRAMMAR>

An Excel export to CSV format should be enough for rendering a compatible structure, ever since the text within a cell is dumped between quotes if it contains spaces. An example of a compliant file with SSML prompts would look like the following:

    sample_prompt_01,"<speak>Welcome to ACME. How can I help you today?</speak>"
    sample_prompt_02,"<speak>Press 1 for sales. <break time=200ms/>Press 2 for Tech Support. <break time=200ms/>Or stay in the line for agent support</speak>"
    ...

Additionally, prepare a YAML document with the structure mentioned in the setup.yaml file included in this repository. The fields are the following:

# CSV format is: FILE_NAME , PROMPT_CONTENT
csv_prompts_file: <my_csv_file.csv>

google_settings:
    # ROUTE TO THE JSON KEY ASSOCIATED TO GCP. IF THE ROUTE HAS SPACES, ADD QUOTES TO THE VALUE
    JSON_key: <my_key.json>

    # PROMPT TYPE. ALLOWED VALUES ARE:
    # normal | SSML
    prompt_type: SSML

    # FILE FORMAT. ALLOWED VALUES ARE:
    # wav | mp3
    output_audio_format: wav

    # COMPLIANT LANGUAGE CODE. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE CODES
    language_code: es-US

    # COMPLIANT VOICE NAME. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE NAMES
    voice_name: es-US-Wavenet-C

    # COMPLIANT VOICE GENDER. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE GENDERS WITH THE SELECTED VOICE ABOVE
    voice_gender: MALE

    # COMPLIANT AUDIO ENCODING. SUPPORTED TYPES ARE:
    # AUDIO_ENCODING_UNSPECIFIED | LINEAR16 | MP3 | OGG_OPUS
    audio_encoding: LINEAR16

3) Dependencies installation

Install the requirements in a virtual environment with the following command:

pip install -r requirements.txt

4) Inline calling

The usage of the script requires the following inline elements:

usage: init.py [-h] [-b BATCH] configurationYAML

Batch prompt generation with Google TTS services

positional arguments:
  configurationYAML     YAML file with operation settings

optional arguments:
  -h, --help            show this help message and exit
  -b BATCH, --batch BATCH
                        Amount of rows in the CSV file to process at the same
                        time. Suggested max value is 100. Default is 10

An example is:

py init.py setup.yaml

The command prompt will show logs based on the status of each row:

✅ Prompt sample_prompt_04.WAV created successfully!
✅ Prompt sample_prompt_01.WAV created successfully!
✅ Prompt sample_prompt_03.WAV created successfully!
✅ Prompt sample_prompt_02.WAV created successfully!

The corresponding audio files will be saved in the same location where this script is executed.

5) Encoding for Cisco CVP Audio Elements

Unfortunately, Google Text-To-Speech service does not support the compulsory 8-bit μ-law encoding as per the Python SDK documentation (I am currently working on a Java version which does support this encoding. This option might be released in the Python SDK in the future). However, there are many online services such as this one for achieving the aforementioned. Audacity can also be used for the purpose. Follow this tutorial for compatible file conversion steps. There's a more straightforward tool which has been proven useful for me in order to process batch files with the CVP compatible settings.

The resulting files can later be uploaded into the Tomcat server for usage within a design in Cisco CallStudio. The route within the CVP Windows Server VM is the following:

    C:\Cisco\CVP\VXMLServer\Tomcat\webapps\CVP\audio

Please refer to the Official Cisco Documentation for more information.

Crafted with ❤️ by Alfonso Sandoval - Cisco

You might also like...
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

TOPSIS implementation in Python Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) CHING-LAI Hwang and Yoon introduced TOPSIS

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

Command Line Text-To-Speech using Google TTS
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

Releases(v1.2.0)
Owner
Ponchotitlán
💻 ☕ 🥃 Let's talk about networks coding, automation and orchestration autour a cup of coffee, and a sip of tequila;
Ponchotitlán
Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

patterns-finder Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Ex

22 Dec 19, 2022
Generate text line images for training deep learning OCR model (e.g. CRNN)

Generate text line images for training deep learning OCR model (e.g. CRNN)

532 Jan 06, 2023
leaking paid token generator that was a shit lmao for 100$ haha

Discord-Token-Generator-Leaked leaking paid token generator that was a shit lmao for 100$ he selling it for 100$ wth here the code enjoy don't forget

Keevo 5 Apr 15, 2022
Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting Official PyTorch Implementation of paper "NeLF: Neural Light-tran

Ken Lin 38 Dec 26, 2022
👄 The most accurate natural language detection library for Python, suitable for long and short text alike

1. What does this library do? Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a prepr

Peter M. Stahl 334 Dec 30, 2022
TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

142 Dec 14, 2022
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

LEE YOON HYUNG 147 Dec 05, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

🌳 Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

18 Nov 17, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Datasets of Automatic Keyphrase Extraction

This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. Following are the datasets and the original papers that proposed them. If yo

LIAAD - Laboratory of Artificial Intelligence and Decision Support 163 Dec 23, 2022
🏆 • 5050 most frequent words in 109 languages

🏆 Most Common Words Multilingual 5000 most frequent words in 109 languages. Uses wordfrequency.info as a source. 🔗 License source code license data

14 Nov 24, 2022
Turn clang-tidy warnings and fixes to comments in your pull request

clang-tidy pull request comments A GitHub Action to post clang-tidy warnings and suggestions as review comments on your pull request. What platisd/cla

Dimitris Platis 30 Dec 13, 2022
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

Tencent YouTu Research 9 Oct 11, 2022
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
GPT-3: Language Models are Few-Shot Learners

GPT-3: Language Models are Few-Shot Learners arXiv link Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-trainin

OpenAI 12.5k Jan 05, 2023
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 03, 2023
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

KoBERTopic 모델 소개 KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정했습니다. 기존 BERTopic : https://github.com/MaartenGr/BERTopic/tree/05a6790b21009d

Won Joon Yoo 26 Jan 03, 2023