Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Last update: Nov 08, 2022

Overview

zhrtvc

Chinese Real Time Voice Cloning

tips: 中文或汉语的语言缩写简称是zh。

关注【啊啦嘻哈】微信公众号，回复一个字【听】，小萝莉有话对你说哦^v^

语音合成工具箱

该项目的语音合成体验和模型推理建议使用ttskit语音合成工具箱。
便携使用的ttskit语音合成工具箱，专注于语音合成应用和部署。
项目ttskit基于本项目zhrtvc构建，侧重于语音合成的使用。
安装：

pip install -U ttskit

快速使用：

import ttskit

# 合成语音
ttskit.tts('这是个样例', audio='24')

网页界面：

# 执行python函数部署
from ttskit import http_server

http_server.start_sever()
# 打开网页：http://localhost:9000/ttskit

# 安装ttskit后，命令行部署网页界面

tkhttp

usage: tkhttp [-h] [--device DEVICE] [--host HOST] [--port PORT]

optional arguments:
  -h, --help       show this help message and exit
  --device DEVICE  设置预测时使用的显卡,使用CPU设置成-1即可
  --host HOST      IP地址
  --port PORT      端口号

命令行：

tkcli

usage: tkcli [-h] [-i INTERACTION] [-t TEXT] [-s SPEAKER] [-a AUDIO]
             [-o OUTPUT] [-m MELLOTRON_PATH] [-w WAVEGLOW_PATH] [-g GE2E_PATH]
             [--mellotron_hparams_path MELLOTRON_HPARAMS_PATH]
             [--waveglow_kwargs_json WAVEGLOW_KWARGS_JSON]

版本

v1.5.6

使用说明和注意事项详见README

注意事项:
- 这个说明是新版GMW版本的语音克隆框架的说明，使用ge2e(encoder)-mellotron-waveglow的模块（简称GMW），运行更简单，效果更稳定和合成语音更加优质。
- 基于项目Real-Time-Voice-Cloning改造为中文支持的版本ESV版本的说明见README-ESV，该版本用encoder-synthesizer-vocoder的模块（简称ESV），运行比较复杂。
- 需要进入zhrtvc项目的代码子目录【zhrtvc】运行代码。
- zhrtvc项目默认参数设置是适用于data目录中的样本数据，仅用于跑通整个流程。
- 推荐使用mellotron的语音合成器和waveglow的声码器，mellotron设置多种模式适应多种任务使用。
中文语料

中文语音语料zhvoice，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。

中文模型

扫描上面的二维码，关注**【啊啦嘻哈】微信公众号**，回复：中文语音克隆模型走起，获取百度网盘的模型文件。

合成样例

合成语音样例的目录

目录介绍

zhrtvc

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型在百度网盘下载，下载后解压，替换models文件夹即可。

data

语料样例，包括语音和文本对齐语料。

注意：

该语料样例用于测试跑通模型，数据量太少，不可能使得模型收敛，即不会训练出可用模型。

在测试跑通模型情况下，处理自己的数据为语料样例的格式，用自己的数据训练模型即可。

学习交流

【AI解决方案交流群】QQ群：925294583

点击链接加入群聊：https://jq.qq.com/?_wv=1027&k=wlQzvT0N

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Related tags

Overview

zhrtvc

语音合成工具箱

版本

目录介绍

zhrtvc

models

data

学习交流

Real-Time Voice Cloning

Owner

Kuang Dada

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

A repo for materials relating to the tutorial of CS-332 NLP

State of the Art Natural Language Processing

Quick insights from Zoom meeting transcripts using Graph + NLP

AI-powered literature discovery and review engine for medical/scientific papers

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Interpretable Models for NLP using PyTorch

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

Indonesia spellchecker with python

Model for recasing and repunctuating ASR transcripts

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Unsupervised text tokenizer focused on computational efficiency

A Python/Pytorch app for easily synthesising human voices

Unsupervised Language Modeling at scale for robust sentiment classification

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Code for using and evaluating SpanBERT.