Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Last update: Sep 07, 2022

Related tags

Overview

Multi-speaker DGP

This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Our paper: Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation

Test environment

This repository is tested in the following environment.

Ubuntu 18.04
NVIDIA GeForce RTX 2080 Ti
Python 3.7.3
CUDA 11.1
cuDNN 8.1.1

Setup

You can complete setup by simply executing setup.sh.

$ . ./setup.sh

*Please make sure that installed PyTorch is compatible with CUDA (see https://pytorch.org/ for more info). Otherwise, CUDA error will occur during training.

How to use

This repository is designed according to Kaldi-style recipe. To run the scripts, please follow the below instruction. JVS corpus [Takamichi et al., 2020] can be downloaded from here.

# Move to the recipe directory
$ cd egs/jvs

# Download the corpus to be used. The directory structure will be as follows:

├── conf/     # directory containing YAML format configuration files
├── jvs_ver1/ # downloaded data
├── local/    # directory containing corpus-dependent scripts
└── run.sh    # main scripts

# Run the recipe from scratch
$ ./run.sh

# Or you can run the recipe step by step
$ ./run.sh --stage 0 --stop-stage 0  # train/dev/eval split
$ ./run.sh --stage 1 --stop-stage 1  # preprocessing
$ ./run.sh --stage 2 --stop-stage 2  # train phoneme duration model
$ ./run.sh --stage 3 --stop-stage 3  # train acoustic model
$ ./run.sh --stage 4 --stop-stage 4  # decoding

# During stage 2 & 3, you can monitor logs using Tensorboard
# for example:
$ tensorboard --logdir exp/dgp

How to customize

conf/*.yaml include all settings for data preparation, preprocessing, training, and decoding. We have prepared two configuration files, dgp.yaml and dgplvm.yaml. You can change experimental conditions by editing these files.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

Frigate - NVR With Realtime Object Detection for IP Cameras

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

This is the repository for paper NEEDLE: Towards Non-invertible Backdoor Attack to Deep Learning Models.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

TraSw for FairMOT - A Single-Target Attack example (Attack ID: 19; Screener ID: 24):

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

用opencv的dnn模块做yolov5目标检测，包含C++和Python两个版本的程序

View model summaries in PyTorch!

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

SAS: Self-Augmentation Strategy for Language Model Pre-training

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

AI4Good project for detecting waste in the environment

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

code for "Self-supervised edge features for improved Graph Neural Network training",

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

CVPR2020 Counterfactual Samples Synthesizing for Robust VQA

[MICCAI'20] AlignShift: Bridging the Gap of Imaging Thickness in 3D Anisotropic Volumes