NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Last update: Dec 23, 2022

Overview

NU-Wave — Official PyTorch Implementation

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Junhyeok Lee, Seungu Han @ MINDsLab Inc., SNU

Paper(arXiv): https://arxiv.org/abs/2104.02321 (Accepted to INTERSPEECH 2021)
Audio Samples: https://mindslab-ai.github.io/nuwave

Official Pytorch+Lightning Implementation for NU-Wave.

Update: CODE RELEASED! README is DONE.

Requirements

Pytorch >=1.7.0 for nn.SiLU(swish activation)
Pytorch-Lightning==1.1.6
The requirements are highlighted in requirements.txt.
We also provide docker setup Dockerfile.

Preprocessing

Before running our project, you need to download and preprocess dataset to .pt files

Download VCTK dataset
Remove speaker p280 and p315
Modify path of downloaded dataset data:dir in hparameters.yaml
run utils/wav2pt.py

$ python utils/wav2pt.py

Training

Adjust hparameters.yaml, especially train section.

train:
  batch_size: 18 # Dependent on GPU memory size
  lr: 0.00003
  weight_decay: 0.00
  num_workers: 64 # Dependent on CPU cores
  gpus: 2 # number of GPUs
  opt_eps: 1e-9
  beta1: 0.5
  beta2: 0.999

If you want to train with single speaker, use VCTKSingleSpkDataset instead of VCTKMultiSpkDataset for dataset in dataloader.py. And use batch_size=1 for validation dataloader.
Adjust data section in hparameters.yaml.

data:
  dir: '/DATA1/VCTK/VCTK-Corpus/wav48/p225' #dir/spk/format
  format: '*mic1.pt'
  cv_ratio: (223./231., 8./231., 0.00) #train/val/test

run trainer.py.

$ python trainer.py

If you want to resume training from checkpoint, check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,\
            required = False, help = "Resume Checkpoint epoch number")
    parser.add_argument('-s', '--restart', action = "store_true",\
            required = False, help = "Significant change occured, use this")
    parser.add_argument('-e', '--ema', action = "store_true",\
            required = False, help = "Start from ema checkpoint")
    args = parser.parse_args()

During training, tensorboard logger is logging loss, spectrogram and audio.

$ tensorboard --logdir=./tensorboard --bind_all

Evaluation

run for_test.py or test.py

$ python test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}
or
$ python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,
                required = True, help = "Resume Checkpoint epoch number")
    parser.add_argument('-e', '--ema', action = "store_true",
                required = False, help = "Start from ema checkpoint")
    parser.add_argument('--save', action = "store_true",
               required = False, help = "Save file")

While we provide lightning style test code test.py, it has device dependency. Thus, we recommend to use for_test.py.

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

The audio samples on our webpage are partially derived from:

VCTK dataset(0.92): 46 hours of English speech from 108 speakers.

Repository Structure

.
├── Dockerfile
├── dataloader.py           # Dataloader for train/val(=test)
├── filters.py              # Filter implementation
├── test.py                 # Test with lightning_loop.
├── for_test.py             # Test with for_loop. Recommended due to device dependency of lightning
├── hparameter.yaml         # Config
├── lightning_model.py      # NU-Wave implementation. DDPM is based on ivanvok's WaveGrad implementation
├── model.py                # NU-Wave model based on lmnt-com's DiffWave implementation
├── requirement.txt         # requirement libraries
├── sampling.py             # Sampling a file
├── trainer.py              # Lightning trainer
├── README.md           
├── LICSENSE
├── utils
│  ├── stft.py              # STFT layer
│  ├── tblogger.py          # Tensorboard Logger for lightning
│  └── wav2pt.py            # Preprocessing
└── docs                    # For github.io
   └─ ...

Citation & Contact

If this repository useful for your research, please consider citing! Bibtex will be updated after INTERSPEECH 2021 conference.

@article{lee2021nuwave,
  title={NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling},
  author={Lee, Junhyeok and Han, Seungu},
  journal={arXiv preprint arXiv:2104.02321},
  year={2021}
}

If you have a question or any kind of inquiries, please contact Junhyeok Lee at [email protected]

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Related tags

Overview

NU-Wave — Official PyTorch Implementation

Requirements

Preprocessing

Training

Evaluation

References

Repository Structure

Citation & Contact

Owner

MINDs Lab

Fast Differentiable Matrix Sqrt Root

Provide partial dates and retain the date precision through processing

OSLO: Open Source framework for Large-scale transformer Optimization

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

JDet is Object Detection Framework based on Jittor.

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

A more easy-to-use implementation of KPConv based on PyTorch.

Implementation of ReSeg using PyTorch

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

上海交通大学全自动抢课脚本，支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Simple ray intersection library similar to coldet - succedeed by libacc

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

EfficientNetV2 implementation using PyTorch

Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast