vits chinese, tts chinese, tts mandarin

Last update: Dec 14, 2022

Related tags

Text Data & NLP tts

Overview

vits实现的中文TTS

this is the copy of https://github.com/jaywalnut310/vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Espnet连接：github.com/espnet/espnet/tree/master/espnet2/gan_tts/vits

coqui-ai/TTS连接：github.com/coqui-ai/TTS/tree/main/recipes/ljspeech/vits_tts

如果有侵权行为，请联系我，我将删除项目

If there is infringement, please contact me and I will delete the item

基于VITS 实现 16K baker TTS 的流程记录

apt-get install espeak

pip install -r requirements.txt

cd monotonic_align

python setup.py build_ext --inplace

将16K标贝音频拷贝到./baker_waves/，启动训练

python train.py -c configs/baker_base.json -m baker_base

两张1080卡，训练两天，基本可以使用了

测试

python vits_strings.py

上面的模型训练出来后存在，明显停顿的问题

原因：

1，本来已经在音素后面强插边界了，VITS又强插边界了，具体是配置参数："add_blank": true

2，可能影响，随机时长预测，具体配置参数：use_sdp=True

vits chinese, tts chinese, tts mandarin

Related tags

Overview

基于VITS 实现 16K baker TTS 的流程记录

将16K标贝音频拷贝到./baker_waves/，启动训练

测试

Owner

AmorTX

Anuvada: Interpretable Models for NLP using PyTorch

null

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

A fast, efficient universal vector embedding utility package.

A collection of GNN-based fake news detection models.

Knowledge Management for Humans using Machine Learning & Tags

DELTA is a deep learning based natural language and speech processing platform.

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

Sapiens is a human antibody language model based on BERT.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

结巴中文分词

Speech Recognition for Uyghur using Speech transformer

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Clone a voice in 5 seconds to generate arbitrary speech in real-time

leaking paid token generator that was a shit lmao for 100$ haha

ByT5: Towards a token-free future with pre-trained byte-to-byte models