Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Overview

Tacotron 2 (without wavenet)

PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.

This implementation includes distributed and automatic mixed precision support and uses the LJSpeech dataset.

Distributed and Automatic Mixed Precision support relies on NVIDIA's Apex and AMP.

Visit our website for audio samples using our published Tacotron 2 and WaveGlow models.

Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Setup

  1. Download and extract the LJ Speech dataset
  2. Clone this repo: git clone https://github.com/NVIDIA/tacotron2.git
  3. CD into this repo: cd tacotron2
  4. Initialize submodule: git submodule init; git submodule update
  5. Update .wav paths: sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt
    • Alternatively, set load_mel_from_disk=True in hparams.py and update mel-spectrogram paths
  6. Install PyTorch 1.0
  7. Install Apex
  8. Install python requirements or build docker image
    • Install python requirements: pip install -r requirements.txt

Training

  1. python train.py --output_directory=outdir --log_directory=logdir
  2. (OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers are ignored

  1. Download our published Tacotron 2 model
  2. python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

  1. python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Inference demo

  1. Download our published Tacotron 2 model
  2. Download our published WaveGlow model
  3. jupyter notebook --ip=127.0.0.1 --port=31337
  4. Load inference.ipynb

N.b. When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2 and the Mel decoder were trained on the same mel-spectrogram representation.

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis

nv-wavenet Faster than real time WaveNet.

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.

We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.

Issues
  • Optimize model for inference speed

    Optimize model for inference speed

    https://github.com/NVIDIA/waveglow/issues/54 In this issue, they were talking about lower some parameters to maximize inference speed. But I dont know how to do it properly, what can be reduced and what need to remain. Anyone did this before? Please send me your hparams configuration.

    if I trained my model using fp32, can it run inference in fp16 and vice versa? in this case, will it impove inference speed? I am using RTX 2080ti, my model run 7 times faster than real-time, and I am pretty sure it can be improved

    and one more thing, is there any benefit of running inference using multi-GPUs?

    opened by EuphoriaCelestial 65
  • How much iterations i need?

    How much iterations i need?

    Training on Russian dataset, output says I have less than 0.2 loss, the default is 500 epoch. Now I'm on 1333 epoch and still get Warning! Reached max decoder steps Should I counting or it screwed up? http://puu.sh/CgXpt/41886048cd.jpg

    opened by hadaev8 61
  • Little gap between words on the alignment plot

    Little gap between words on the alignment plot

    Hi, My alignment plot has a little gap between words.

    alignment-63k

    batch size - 64 filter_length=2048, hop_length=275, win_length=1100,

    And it sounds bit like reading separated words. Is there anyone who had same issue before.

    Thanks.

    opened by dnnnew 50
  • How to train a new model with dataset of diffirent language?

    How to train a new model with dataset of diffirent language?

    I would like to know if it possible to train a Tacotron 2 model for another language, using another dataset which have the same structure as LJ Speech dataset? And if it is possible, is there any tutorial to do so?

    opened by EuphoriaCelestial 44
  • Audio examples?

    Audio examples?

    Very cool work, this! However, it would be ideal to also provide examples of input text + output audio from a trained system, alongside held-out examples from the database. This will give an impression of what kind of results the code is capable of generating with the LJSpeech data, and is standard practise in the text-to-speech field.

    Aside from synthesising held-out sentences from LJSpeech, Google's speech examples for Tacotron 2 provide another set of challenging text prompts to generate.

    Are there any plans to do this? Or are synthesised speech examples already available somewhere?

    opened by ghenter 36
  • Model can not converge

    Model can not converge

    Hello, I have a question.

    I'm using a dataset with > 17k sentences (about 30 hours audio), 90% for training and 10% for validating. It's been training for 3 days (using batch_size 8) and reaching Epoch 56. Plz see training info below [Grad norm] image [Training loss] image [Validation loss] image I thought it looks good. But when I tested it, the output audio was wrong and Attention looks awful. image

    And the loss seems can not decrease any more. Do I have to train for more Epoch or there was something wrong with my dataset, or something else? Plz help me, thank u guys so much.

    opened by HiiamCong 31
  • Hoarseness in synthesised voice

    Hoarseness in synthesised voice

    Hi, so we have been training both tacotron2 and waveglow models on clean male speech (Hindi language) of about 10+ hours @ 16kHz sampling rate using phonemic text. We keep the window parameters, and mel_band parameters unchanged from the 22.5kHz setup in the original repo. Both the models were trained from scratch in a distributed manner using 8-V100 for over 2 days. The resulting voice synthesis produces a hoarse voice. We tried synthesizing using models from different iterations. The hoarseness remains the same irrespective. From visualizing the spectrogram of the original (left-side of the image) and synthesized audio (right-side of the image) of the same text we observe that the spectrogram of synthesized audio is smudged in the middle frequencies in comparison to crisp harmonics of the original audio.

    Any suggestions on how to train better?

    Left: Original (Audio) , Right: Synthesized (Audio)

    Spectrogram - left: Original, right: Synthesized

    opened by sharathadavanne 28
  • Gate_output is being mis-predicted

    Gate_output is being mis-predicted

    I have trained the model for only few steps (8000) for testing.

    At this checkpoint, the synthesized mel spectrogram's frames always bigger than max frames (in hparams config)

    As I have checked, the problem is gate_output always less than configured gate_threshold (it's around 0.0022 or 0.0023), so the frames had been generated forever.

    Is this problem because I was not trained model much enough, or what should I have to check?

    Please help, thank you so much.

    opened by Thien223 26
  • How to combine these models with gst-tacotron,

    How to combine these models with gst-tacotron,

    Hello, @rafaelvalle Really nice work.

    I would like to combine this work with global style tokens.. please let me know

    opened by aishweta 25
  • It's working now---using Chinese corpus

    It's working now---using Chinese corpus

    the plot like screenshot from 2018-05-14 15-26-33 the train is very slowing

    opened by maozhiqiang 24
  • Warning! Reached max decoder steps

    Warning! Reached max decoder steps

    I had no such problem before But today used pretrained models And sometimes spectrogram is bugged

    here is notebook for reproduce https://colab.research.google.com/drive/1jR12cEKdkg0hlDUHGhf2fPb0RwqPwEYj?#scrollTo=CyBu2F7eisFM

    opened by hadaev8 22
  • Overfitting: validation loss stuck.

    Overfitting: validation loss stuck.

    Hello, I am experimenting with 3.5 hours of data set. (16kbit, 22khz, 9.5s av.duration, 1300 audio files) I am using pretrained models which are provided by this repo.

    My hparams; p_attention_dropout=0.5, p_decoder_dropout=0.5, learning_rate=4e-5, batch_size=16,

    After 15k iteraton: Train loss 0.25. It is producing almost original like outputs. alignment melpredict meltarget

    The only thing I couldn't improve is a validation loss, it stuck since 4000 iteration.

    validationloss

    All suggestions are welcome! Thanks beforehand.

    opened by ksaidin 21
  • scaling Mel Spectrogram output for Wavenet Vocoder

    scaling Mel Spectrogram output for Wavenet Vocoder

    Hello,

    First of all thanks for the nice Tacotron 2 implementation.

    I'm trying to use the trained Tacotron 2 outputs as inputs to r9r9's Wavenet vocoder. However his pre-trained wavenet works on scaled Mel Spectrogram between [0, 1].

    What is the range for this tacotron 2 implementation, I'm having a hard time finding this out to use it for scaling.

    For reference, this is r9r9's normalization function that he applies to the Mel Spectrogram before using it for training, which scales it between 0 and 1:

    def _normalize(S): return np.clip((S - hparams.min_level_db) / -hparams.min_level_db, 0, 1)

    opened by G-Wang 20
  • Bad alignment in distributed train

    Bad alignment in distributed train

    Im training model one single v100 and 8xv100 instances. This is alignment from single gpu train https://i.imgur.com/byKRmiN.png This is from multi https://i.imgur.com/AsYjTbV.png 11648 is equivalent of 93184 step

    Any advises?

    opened by hadaev8 18
  • CUDA Runtime Error: Out of Memory

    CUDA Runtime Error: Out of Memory

    I finally got all the errors resolved, but then this new one came up: RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

    Here is the full log: ~/tacotron2$ python3 train.py --output_directory=outdir --log_directory=logdir FP16 Run: False Dynamic Loss Scaling: True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False /home/mrbreadwater/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain)) /home/mrbreadwater/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. gain=torch.nn.init.calculate_gain(w_init_gain)) Epoch: 0 THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train.py", line 291, in <module> args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 216, in train y_pred = model(x) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 510, in forward encoder_outputs, targets, memory_lengths=input_lengths) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 403, in forward decoder_input) File "/home/mrbreadwater/tacotron2/model.py", line 363, in decode attention_weights_cat, self.mask) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 77, in forward attention_hidden_state, processed_memory, attention_weights_cat) File "/home/mrbreadwater/tacotron2/model.py", line 60, in get_alignment_energies processed_query + processed_attention_weights + processed_memory)) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

    I'm using a GTX 1050 Ti. Anything I can do to fix it?

    EDIT: I'm running Ubuntu 18.04

    opened by MrBreadWater 18
  • Why is the 'training' parameter of dropout in Prenet set to True?

    Why is the 'training' parameter of dropout in Prenet set to True?

    In the code of Prenet

    def forward(self, x):
        for linear in self.layers:
            x = F.dropout(F.relu(linear(x)), p=0.5, training=True)
        return x
    

    Why is 'training=True'? Shouldn't it be 'training=self.training'? Does that mean we apply dropout when inference? I changed this to 'training=self.training' and the pre-trained model is unable to generate correct audio.

    opened by jjl1994 18
  • which parameter defines the max char length in the input text?

    which parameter defines the max char length in the input text?

    I was trying to look at the hparams.py file, but couldn't find it there. I'm new to the text-to-speech modeling.

    How to set max char length, and are all shorter sentences zero-padded during both training/inference?

    The encoder_lstm_units in each direction are 256, so does it mean the max char length in text must be 256, If so, how would one synthesize sentences or paragraphs that are much longer in continuous speech?

    opened by jjoe1 18
  • train.py fails to train

    train.py fails to train

    Hi @rafaelvalle , thanks for the great implementation!

    But I can't start the training right as stated in your README (with pytorch-0.4.0):

    $ python train.py -o outdir -l logdir
    FP16 Run: False
    Dynamic Loss Scaling True
    Distributed Run: False
    cuDNN Enabled: True
    cuDNN Benchmark: False
    /root/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
      self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain))
    /root/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
      gain=torch.nn.init.calculate_gain(w_init_gain))
    Epoch: 0
    Traceback (most recent call last):
      File "train.py", line 272, in <module>
        args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
      File "train.py", line 197, in train
        y_pred = model(x)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 110, in forward
        inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in scatter
        return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
        inputs = scatter(inputs, target_gpus, dim) if inputs else []
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 29, in scatter
        return scatter_map(inputs)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 16, in scatter_map
        return list(zip(*map(scatter_map, obj)))
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 16, in scatter_map
        return list(zip(*map(scatter_map, obj)))
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 14, in scatter_map
        return Scatter.apply(target_gpus, None, dim, obj)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py", line 74, in forward
        outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
      File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 143, in scatter
        chunks = tensor.chunk(len(devices), dim)
    RuntimeError: chunk expects at least a 1-dimensional tensor
    

    Thanks!

    opened by nikitos9000 18
  • Hard to train on one small GPU

    Hard to train on one small GPU

    I wrote this issue in response to this one.

    The issue is that I am using an 8Gb GPU (NVIDIA GeForce GTX 980M) so I've brought down batch size to 24 with the LJ Speech dataset. And after 4 days (I am now at 31k iterations), I do not see any improvement in the attention alignement. The loss is stucked between 0.7 and 0.5 almost since the start of the training.

    At this point, do I have to wait longer or do you consider it is a failure ?

    Here are my curves and alignement plot at this point: Logs alignement

    opened by yliess86 18
  • Inference

    Inference

    I trained the tacotron2 with my own data. When performing the inference with the Jupyter Notebook, on the section "Load model from checkpoint", there is a "tacotron2_statedict.pt" file. How do I create my own "*.pt" file from the "checkpoint" files? Can someone point me in the right direction?

    Thank you so much!! @rafaelvalle

    opened by jferrer21 16
  • LJ Dataset Silence

    LJ Dataset Silence

    I downloaded the LJ dataset and trained with it, it seems to have about 1 second of silence at the end of each file, which is causing a flat line in my alignment

    Did anyone else run into this problem? Did you clean the LJ dataset before training?

    opened by generaltso518 16
  • Plan to integrate with nv-wavenet?

    Plan to integrate with nv-wavenet?

    I think integrating taco2 with nv-wavenet still is a long way, first thing i think nv-wavenet should support mol, not just 8bit.

    opened by neverjoe 16
  • Question about training from scratch

    Question about training from scratch

    Should training from scratch provide a checkpoint(.pt) file? If so is there a guide someone can point me to that explains how to do this? Using the following command would lead me to believe the file would be saved in the folder "outdir" but this is not the case. The training runs through each epoch without any errors and ends.

    python train.py --output_directory=outdir --log_directory=logdir

    opened by harry6two 15
  • You can use the pretrained LJSpeech waveglow for any language. It will even work for a male voice.

    You can use the pretrained LJSpeech waveglow for any language. It will even work for a male voice.

    You can use the pretrained LJSpeech waveglow for any language. It will even work for a male voice.

    Originally posted by @tugstugi in https://github.com/NVIDIA/tacotron2/issues/158#issuecomment-469281376

    @tugstugi How do I configure male voice for pretrained Tacotron2 and WaveGlow models? Or do I need to train another model using a pre-trained model with different dataset(for male) altogether?

    Got it working for the default female voice using the pretrained models. I am pretty new to this and really need to understand what needs to be done next. Just looking for male voice for English language.

    Thanks in advance.

    opened by khushbu-mulani 15
  • RuntimeError: code is too big

    RuntimeError: code is too big

    Hi! I'm having a problem with getting a RuntimeError saying that the code is too big. I'm using the LJSpeech dataset and I'm trying to train my own model. The machine has GTX1080 Ti GPU. I installed Pytorch 1.0 with CUDA 10.0. Here's the traceback:

    (tacotron2-py3) [email protected]:~/tacotron2$ python train.py --output_directory=data/lj_speech --log_directory=logs/lj_speech FP16 Run: False Dynamic Loss Scaling: True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False Input dir: None Output dir: data/lj_speech Batch size: 64 Epoch: 0 Traceback (most recent call last): File "train.py", line 289, in train(args.input_directory, args.output_directory, args.log_directory, args.checkpoint_path, args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 205, in train for i, batch in enumerate(train_loader): File "/home/annika/.conda/envs/tacotron2-py3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 468, in next return self._process_next_batch(batch) File "/home/annika/.conda/envs/tacotron2-py3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 489, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/annika/.conda/envs/tacotron2-py3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/annika/.conda/envs/tacotron2-py3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/annika/tacotron2/data_utils.py", line 63, in getitem return self.get_mel_text_pair(self.audiopaths_and_text[index]) File "/home/annika/tacotron2/data_utils.py", line 34, in get_mel_text_pair mel = self.get_mel(audiopath) File "/home/annika/tacotron2/data_utils.py", line 48, in get_mel melspec = self.stft.mel_spectrogram(audio_norm) File "/home/annika/tacotron2/layers.py", line 76, in mel_spectrogram magnitudes, phases = self.stft_fn.transform(y) File "/home/annika/tacotron2/stft.py", line 95, in transform padding=0) RuntimeError: code is too big

    opened by annlaumets 15
  • Pre-trained model to resume/fine-tune train from?

    Pre-trained model to resume/fine-tune train from?

    I notice that the pre-trained model does not include optimiser details. @rafaelvalle mentioned in another thread it is not published as a checkpoint to resume from.

    It would be great to have a model that can be resumed, for fine-tuning on smaller datasets.

    Can anybody share a LJSpeech or other heavily trained model checkpoint?

    Thanks in advance, Duvte.

    opened by duvtedudug 15
  • Silence after inference

    Silence after inference

    Hello, I have an issue after proceeding inference. After training every time when I try to use model from different checkpoints it gives the silent result.

    Below you can see my spectrograms: inf

    Does anyone have suggestions how to solve it?

    opened by OOps717 14
  • RuntimeError: torch/csrc/autograd/variable.cpp:138: get_grad_fn: Assertion `output_nr_ == 0` failed.

    RuntimeError: torch/csrc/autograd/variable.cpp:138: get_grad_fn: Assertion `output_nr_ == 0` failed.

    Hello,

    Centos 7 Python 3.6 CUDA 9

    Used conda for installing pytorch for this configuration.

    Had to remove the version specification for numpy and torch in your requirements.txt as I got the error message logged in issue #5

    Now, numpy = 1.14.3 torch = 0.4.0 tensorflow = 1.6.0

    After the requirements installation, when I run train.py, I get the following error.

    Please help:


    python train.py --output_directory=./outdir --log_directory=./logdir /home/tacotron2/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from .conv import register_converters as register_converters FP16 Run: False Dynamic Loss Scaling True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False /home/tacotron2/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform. self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain)) /home/tacotron2/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform. gain=torch.nn.init.calculate_gain(w_init_gain)) Epoch: 0 Traceback (most recent call last): File "train.py", line 285, in args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 210, in train y_pred = model(x) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply raise output File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker output = module(*input, **kwargs) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/tacotron2/tacotron2/model.py", line 506, in forward encoder_outputs = self.encoder(embedded_inputs, input_lengths) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/tacotron2/tacotron2/model.py", line 188, in forward outputs, _ = self.lstm(x) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward output, hidden = func(input, self.all_weights, hx, batch_sizes) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward return func(input, *fargs, **fkwargs) File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward dropout_ts) RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion output_nr == 0 failed.

    opened by abuvaneswari 14
  • How to synthesize audio from checkpoint?

    How to synthesize audio from checkpoint?

    Finally I got the python -m multiproc train.py etc... to work. Simple question: How do I now synthesize audio form a specific checkpoint?

    In github.com/keithito/tacotron it was pretty simple: python eval.py --checkpoint 'path_to_checkpoint', but here I don't see any documentation on generating wav's.

    opened by mligema 14
  • how to improve the quality of male's voice?

    how to improve the quality of male's voice?

    Thanks for your job.I have trained your code on my own dataset, as a result .I can get a good quality of voice when I train with female's voice, but get a poor quality of voice when using male's voice to train( both source voice have a very good quality, male 's data is easy to alignment when set tacotron_output_perstep=1, but male's data is hard to alignment). by the way ,I keep the hparams same as yours. Do you have any suggestions for me? Thank you.

    opened by wotulong 14
  • Please check my approach for training a model with prosody

    Please check my approach for training a model with prosody

    Hi all!

    I am trying to train a tacatron model with prosody for the Kazakh language. In particular, I am trying to teach a model to put stress on particular vowels. Here are some of my train samples:

    file1.wav|ешкІм ештеңЕ істемЕді\n file2.wav|бүгІн менІң қуанышымдА шЕк жОқ file3.wav|көпқұрылЫм жұмысқА кедергІ келтірЕді дЕді Ол

    where stressed vowels are in caps.

    Besides having such a dataset, I got rid of lowering letters in the cleaners' script. My approach was as follows:

    1. I didn't want to train a single model with prosody. Instead, I wanted to train a so-called base model having trained on all the necessary symbols, but all the data in lowercase.
    2. I then want to fine-tune a model with prosody on that base model by modifying the dataset (making the stressed vowels in caps)

    Can you please tell me if my approach is right? I am not sure that on inference I am capable of controlling the stress. What I mean is that I want to have the following two samples sound differently: жұмысқА and жҰмысқа (the first case should stress on the last vowel, the second one should stress on the first vowel)

    After 12000 iterations of training from warm-start I am not sure that I am capable of controlling the stress.

    Also, can you please tell me what is the Y-axis in the alignment figure? I see that is scales to 60. Is it the axis for symbols? In my case, there are 94 of them. image

    --------------------------------Here is my symbols.py----------------------- pad = '' _punctuation = '!+(),.:;? ' _special = '-' _letters = 'АӘБВГҒДЕЁЖЗИЙКҚЛМНҢОӨПРСТУҰҮФХҺЦЧШЩЪЫІЬЭЮЯаәбвгғдеёжзийқклмнңоөпрстуүұфхһцчшщъыіьэяю' symbols = [_pad] + list(_special) + list(_punctuation) + list(_letters)

    opened by aidosRepoint 0
  • Unable To Start New Model or Synthesis Text To Speech

    Unable To Start New Model or Synthesis Text To Speech

    When I train my own pretrained model that I have been training for months, it works. if I try to create a new one or I try to synthesis text to speech I get the error

    I am unable to start a new model as I receive an Error. I have narrowed it down to loading the checkpoint. I believe. I am using my own code but I have tried multiple notebooks.(Cookie, bfs, etc...) and they all give the same error.

    Load Checkpoint Code:

    # Load checkpoint if one exists
        iteration = 0
        epoch_offset = 0
        if checkpoint_path is not None and os.path.isfile(checkpoint_path):
            if warm_start:
                model = warm_start_model(
                    checkpoint_path, model, hparams.ignore_layers)
            else:
                model, optimizer, _learning_rate, iteration = load_checkpoint(
                    checkpoint_path, model, optimizer)
                if hparams.use_saved_learning_rate:
                    learning_rate = _learning_rate
                iteration += 1  # next iteration is iteration + 1
                epoch_offset = max(0, int(iteration / len(train_loader)))
        else:
          os.path.isfile("pretrained_model")
          download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
          model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
          # download LJSpeech pretrained model if no checkpoint already exists
    

    It seems the LJSpeech pretrained model has become corrupt in some way. Can You help with this? or point me in the right direct to get this fixed?

    Train Code:

    train(output_directory, log_directory, checkpoint_path,
          warm_start, n_gpus, rank, group_name, hparams, log_directory2)
    

    Error:

    FP16 Run: False
    Dynamic Loss Scaling: True
    Distributed Run: False
    cuDNN Enabled: True
    cuDNN Benchmark: False
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  1555  100  1555    0     0  77750      0 --:--:-- --:--:-- --:--:-- 77750
    Warm starting model from checkpoint 'pretrained_model'
    ---------------------------------------------------------------------------
    UnpicklingError                           Traceback (most recent call last)
    [<ipython-input-11-e8a871a0e98d>](https://localhost:8080/#) in <module>()
          5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
          6 train(output_directory, log_directory, checkpoint_path,
    ----> 7       warm_start, n_gpus, rank, group_name, hparams, log_directory2)
    
    3 frames
    [/usr/local/lib/python3.7/dist-packages/torch/serialization.py](https://localhost:8080/#) in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
        775             "functionality.")
        776 
    --> 777     magic_number = pickle_module.load(f, **pickle_load_args)
        778     if magic_number != MAGIC_NUMBER:
        779         raise RuntimeError("Invalid magic number; corrupt file?")
    
    UnpicklingError: invalid load key, '<'.
    

    I have tried researching this pickling error... and after about 10 mins I figured out it is built into torch.... there is nothing I can change in the code. to my knowledge

    Id also like to point out this part of the code 779 raise RuntimeError("Invalid magic number; corrupt file?")

    opened by gmirsky2 2
  • build(deps): bump tensorflow from 1.15.2 to 2.5.3

    build(deps): bump tensorflow from 1.15.2 to 2.5.3

    Bumps tensorflow from 1.15.2 to 2.5.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.5.3

    Release 2.5.3

    Note: This is the last release in the 2.5 series.

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)
    • Fixes an integer overflow in TFLite array creation (CVE-2022-23558)
    • Fixes an integer overflow in TFLite (CVE-2022-23559)
    • Fixes a dangerous OOB write in TFLite (CVE-2022-23561)
    • Fixes a vulnerability leading to read and write outside of bounds in TFLite (CVE-2022-23560)
    • Fixes a set of vulnerabilities caused by using insecure temporary files (CVE-2022-23563)
    • Fixes an integer overflow in Range resulting in undefined behavior and OOM (CVE-2022-23562)
    • Fixes a vulnerability where missing validation causes tf.sparse.split to crash when axis is a tuple (CVE-2021-41206)
    • Fixes a CHECK-fail when decoding resource handles from proto (CVE-2022-23564)
    • Fixes a CHECK-fail with repeated AttrDef (CVE-2022-23565)
    • Fixes a heap OOB write in Grappler (CVE-2022-23566)
    • Fixes a CHECK-fail when decoding invalid tensors from proto (CVE-2022-23571)
    • Fixes an unitialized variable access in AssignOp (CVE-2022-23573)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateTensorSize (CVE-2022-23575)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateOutputSize (CVE-2022-23576)
    • Fixes a null dereference in GetInitOp (CVE-2022-23577)
    • Fixes a memory leak when a graph node is invalid (CVE-2022-23578)
    • Fixes an abort caused by allocating a vector that is too large (CVE-2022-23580)
    • Fixes multiple CHECK-failures during Grappler's IsSimplifiableReshape (CVE-2022-23581)
    • Fixes multiple CHECK-failures during Grappler's SafeToRemoveIdentity (CVE-2022-23579)
    • Fixes multiple CHECK-failures in TensorByteSize (CVE-2022-23582)
    • Fixes multiple CHECK-failures in binary ops due to type confusion (CVE-2022-23583)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.5.3

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)

    ... (truncated)

    Commits
    • 959e9b2 Merge pull request #54213 from tensorflow/fix-sanity-on-r2.5
    • d05fcbc Fix sanity build
    • f2526a0 Merge pull request #54205 from tensorflow/disable-flaky-tests-on-r2.5
    • a5f94df Disable flaky test
    • 7babe52 Merge pull request #54201 from tensorflow/cherrypick-510ae18200d0a4fad797c0bf...
    • 0e5d378 Set Env Variable to override Setuptools new behavior
    • fdd4195 Merge pull request #54176 from tensorflow-jenkins/relnotes-2.5.3-6805
    • 4083165 Update RELEASE.md
    • a2bb7f1 Merge pull request #54185 from tensorflow/cherrypick-d437dec4d549fc30f9b85c75...
    • 5777ea3 Update third_party/icu/workspace.bzl
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Fix for issue: inference.ipynb audio output silent

    Fix for issue: inference.ipynb audio output silent

    Issue: https://github.com/NVIDIA/tacotron2/issues/475 As mentioned by anaivebird: https://github.com/NVIDIA/tacotron2/issues/475#issuecomment-901601628 This seems to work fine.

    opened by sourjatilak 0
  • Tacotron2 Issues with Inference and using a Custom Dataset

    Tacotron2 Issues with Inference and using a Custom Dataset

    I believe I am currently having an issue when training from both scratch and the pre-trained tacotron2 model.

    I have collected 14 to 17 hours of pre-processed wav files of Obama speaking. Each file was initially normalized with ffmpeg-normalize and then resampled to the recommended 22050Hz.

    I have ensured that:

    • the Sampling rate of each wav file is 22050Hz
    • there is only a Single speaker: Obama
    • the Speech contains a variety of speech phonemes
    • each Audio file is split into segments of 10 seconds
    • each of the Audio segments does NOT have silence at the beginning and end of the file
    • each of the Audio segments does not contain long silences

    Here is a link to a drive containing the wav files for inspection:

    https://drive.google.com/drive/folders/17RoPoNhcU6ovW0BBkONt3WEXf6ZvuUwF?usp=download

    Here is a link to both of the formatted .txt files (train and val):

    Train .txt file: https://drive.google.com/file/d/1dxTkagpAT43jP06QAeODWS92GmuqdPqz/view?usp=sharing Validation .txt file: https://drive.google.com/file/d/1dtaHPWTFdXLM1QdOVb2V9H2a_VMKVWRg/view?usp=sharing

    I formatted the .txt files in the same way as the LJSpeech dataset. I used wav2vec2.0 for transcriptions. I made sure that any spaces at the start and end of the transcriptions are removed, and that a period was added to the end of each transcript. Each should be on a new line.

    The train.py script will run. The directory paths and naming conventions are correct.

    This is what a graph of the training inference looks like at epochs 0, 50, and 100:

    Epoch 0:

    531816681ab45e27dc0e382df3198f71

    Epoch 50:

    e926113b3eb88b9e4519cf93804bfd0a

    Epoch 100:

    fc8476aaad5e143b73bb3ca84a536a3f

    Epoch 250:

    1f0f98d92629c0fff10c00bc73f5641d

    Is this how the charts should be looking? Any help would be appreciated!

    @CookiePPP Any input on this?

    opened by conceptofmind 0
  • Error when pasting Trained Model into Synthesis Notebook Tacotron 2

    Error when pasting Trained Model into Synthesis Notebook Tacotron 2

    So I'm trying to create my own deepfake audio model using Tacotron 2. I have successfully created the trained model (I think because there were no errors) using Tacotron 2 and google collab. However, when loading my trained model into the synthesis notebook, I start running into several errors that I don't know how to solve.

    NameError: name 'initilized' is not defined

    RuntimeError Traceback (most recent call last)

    RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

    I have no idea how to diagnose the problem or solve these errors and there is very little material on how to solve errors when creating Tacotron deepfake audio models. Feel free to use the link to the model and paste it in the synthesis notebook and look at the errors for yourself as it might be easier to see the errors hands-on. It's a simple test model of Drake the rapper. All you are supposed to do is paste the address of the pre-trained model in the synthesis notebook and hit the play button, but when I do so it gives me the errors. Perhaps there is something wrong with my trained model although I received no errors when making it. Any help or feedback would be greatly appreciated.

    Training notebook: https://colab.research.google.com/drive/14X73UiywnoL9VS30iPDcX4WXxwZWv2e2

    Synthesis notebook: https://colab.research.google.com/drive/1PZ4andZVFc8YALmhBbBB3AJghLN0r8zb

    Trained model: 1-8ZQb5uFFR4AtzmYoq_cZFADytuNOEyn

    opened by BinaryBat2020 1
  • Where can I find the code for the TacoSpawn speaker generator?

    Where can I find the code for the TacoSpawn speaker generator?

    Where can I find the Speaker Generator Code? https://google.github.io/tacotron/publications/speaker_generation/index.html

    opened by Nikuson123 0
  • Strange mel spectogram

    Strange mel spectogram

    Hello,

    I've a strange mel-spectogram. grafik I do transferlearning with a dataset containing 1.5 hours speech. I had a strong overfitting, therefore I augmented my data with the following code:

    def get_white_noise(signal,SNR) :
        #RMS value of signal
        RMS_s=math.sqrt(np.mean(signal**2))
        #RMS values of noise
        RMS_n=math.sqrt(RMS_s**2/(pow(10,SNR/10)))
        #Additive white gausian noise. Thereore mean=0
        #Because sample length is large (typically > 40000)
        #we can use the population formula for standard daviation.
        #because mean=0 STD=RMS
        STD_n=RMS_n
        noise=np.random.normal(0, STD_n, signal.shape[0])
        return noise
    def add_noise(input_dir, outdir):
        os.makedirs(outdir,exist_ok=True)
        files=glob.glob(input_dir+'/*.wav')
    
        for file in files:
            end=file.split('/')[-1]
            out_path=outdir+'/'+end
            signal_file=file
            signal, sr = librosa.load(signal_file)
            signal=np.interp(signal, (signal.min(), signal.max()), (-1, 1))
            noise=get_white_noise(signal,SNR=10)
            #analyze the frequency components in the signal
            X=np.fft.rfft(noise)
            signal_noise=signal+noise
            sf.write(out_path,signal_noise,sr)
    

    Is this correct?

    My tensorboard looks the following after training for 15k epochs (smoothing: 0,6):

    grafik grafik grafik grafik

    My hyperparameters are the following: batch size: 32 weight_decay=1e-7, p_attention_dropout=0.3 p_decoder_dropout=0.15 learningrate: 1e-3*(0.01**(epoch/1000)) 87% training data, 13% validation

    Why does the mel spectogram look so strange? I would be very happy to receive tips!

    opened by neuronx1 1
  • Bad inference for short sentences

    Bad inference for short sentences

    Hi,

    I have been encountered this problem, while testing with the pre-trained model, and the model that I trained with my own dataset. When the sentences contain 2 or more words, it will work correctly most of the time. However, when the sentence is 1 word, like "Hello!", "Hi", it will generate unintelligible result.

    Here are some plots of sentence "Hello!"

    Screen Shot 2021-11-24 at 2 56 53 PM

    Some Background:

    • I used my own dataset to train the model, based on the pre-trained model published from this repo.
    • has been trained 101k
    • Train loss 14000 0.146386, Grad Norm 0.442730, Validation loss 14000: 0.105691

    ` ################################ # Experiment Parameters # ################################ epochs=1000, iters_per_checkpoint=1000, seed=1234, dynamic_loss_scaling=True, fp16_run=False, distributed_run=False, dist_backend="nccl", dist_url="tcp://localhost:54321", cudnn_enabled=True, cudnn_benchmark=False, ignore_layers=['embedding.weight'],

        ################################
        # Data Parameters             #
        ################################
        load_mel_from_disk=False,
        training_files='filelists/azure0_audio_text_train_filelist.txt',
        validation_files='filelists/azure0_audio_text_val_filelist.txt',
        text_cleaners=['english_cleaners'],
    
        ################################
        # Audio Parameters             #
        ################################
        max_wav_value=32768.0,
        sampling_rate=22050,
        filter_length=1024,
        hop_length=256,
        win_length=1024,
        n_mel_channels=80,
        mel_fmin=0.0,
        mel_fmax=8000.0,
    
        ################################
        # Model Parameters             #
        ################################
        n_symbols=len(symbols),
        symbols_embedding_dim=512,
    
        # Encoder parameters
        encoder_kernel_size=5,
        encoder_n_convolutions=3,
        encoder_embedding_dim=512,
    
        # Decoder parameters
        n_frames_per_step=1,  # currently only 1 is supported
        decoder_rnn_dim=1024,
        prenet_dim=256,
        max_decoder_steps=2000,
        gate_threshold=0.5,
        p_attention_dropout=0.1,
        p_decoder_dropout=0.1,
    
        # Attention parameters
        attention_rnn_dim=1024,
        attention_dim=128,
    
        # Location Layer parameters
        attention_location_n_filters=32,
        attention_location_kernel_size=31,
    
        # Mel-post processing network parameters
        postnet_embedding_dim=512,
        postnet_kernel_size=5,
        postnet_n_convolutions=5,
    
        ################################
        # Optimization Hyperparameters #
        ################################
        use_saved_learning_rate=False,
        learning_rate=1e-3,
        weight_decay=1e-6,
        grad_clip_thresh=1.0,
        batch_size=16,
        mask_padding=True  # set model's padded outputs to padded values`
    
    opened by cdlIris 0
Owner
NVIDIA Corporation
NVIDIA Corporation
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 201 Jan 20, 2022
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 6.7k Feb 19, 2022
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Quasi-Recurrent Neural Network (QRNN) for PyTorch Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py ex

Salesforce 1.2k Feb 12, 2022
Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Parallel Tacotron2 Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Keon Lee 127 Feb 21, 2022
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 265 Feb 14, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 146 Feb 22, 2022
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

Facebook Research 399 Feb 22, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 26 Nov 5, 2021
Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

null 1.3k Feb 23, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 156 Oct 16, 2021
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 66 Feb 15, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 6 Feb 16, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
A pytorch implementation of faster RCNN detection framework (Use detectron2, it's a masterpiece)

Notice(2019.11.2) This repo was built back two years ago when there were no pytorch detection implementation that can achieve reasonable performance.

Ruotian(RT) Luo 1.8k Feb 22, 2022
A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN Please follow Faster R-CNN and DAF to complete the environment confi

null 2 Jan 12, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 76 Feb 13, 2022
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 224 Feb 18, 2022
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 79 Dec 13, 2021
Frigate - NVR With Realtime Object Detection for IP Cameras

A complete and local NVR designed for HomeAssistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras.

Blake Blackshear 4k Feb 22, 2022