Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Last update: Sep 07, 2022

Related tags

Overview

Multi-speaker DGP

This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Our paper: Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation

Test environment

This repository is tested in the following environment.

Ubuntu 18.04
NVIDIA GeForce RTX 2080 Ti
Python 3.7.3
CUDA 11.1
cuDNN 8.1.1

Setup

You can complete setup by simply executing setup.sh.

$ . ./setup.sh

*Please make sure that installed PyTorch is compatible with CUDA (see https://pytorch.org/ for more info). Otherwise, CUDA error will occur during training.

How to use

This repository is designed according to Kaldi-style recipe. To run the scripts, please follow the below instruction. JVS corpus [Takamichi et al., 2020] can be downloaded from here.

# Move to the recipe directory
$ cd egs/jvs

# Download the corpus to be used. The directory structure will be as follows:

├── conf/     # directory containing YAML format configuration files
├── jvs_ver1/ # downloaded data
├── local/    # directory containing corpus-dependent scripts
└── run.sh    # main scripts

# Run the recipe from scratch
$ ./run.sh

# Or you can run the recipe step by step
$ ./run.sh --stage 0 --stop-stage 0  # train/dev/eval split
$ ./run.sh --stage 1 --stop-stage 1  # preprocessing
$ ./run.sh --stage 2 --stop-stage 2  # train phoneme duration model
$ ./run.sh --stage 3 --stop-stage 3  # train acoustic model
$ ./run.sh --stage 4 --stop-stage 4  # decoding

# During stage 2 & 3, you can monitor logs using Tensorboard
# for example:
$ tensorboard --logdir exp/dgp

How to customize

conf/*.yaml include all settings for data preparation, preprocessing, training, and decoding. We have prepared two configuration files, dgp.yaml and dgplvm.yaml. You can change experimental conditions by editing these files.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

StyleGAN2-ada for practice

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Pytorch Lightning Implementation of SC-Depth Methods.

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

An excellent hash algorithm combining classical sponge structure and RNN.

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

The FIRST GANs-based omics-to-omics translation framework

⚾🤖⚾ Automatic baseball pitching overlay in realtime

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Release of the ConditionalQA dataset

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Pytorch Implementation of rpautrat/SuperPoint

SwinIR: Image Restoration Using Swin Transformer

MARE - Multi-Attribute Relation Extraction