Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Overview

Chunked Autoregressive GAN (CARGAN)

PyPI License Downloads

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [companion website]

Table of contents

Installation

pip install cargan

Configuration

All configuration is performed in cargan/constants.py. The default configuration is CARGAN. Additional configuration files for experiments described in our paper can be found in config/.

Inference

CLI

Infer from an audio files on disk. audio_files and output_files can be lists of files to perform batch inference.

python -m cargan \
    --audio_files 
   
     \
    --output_files 
    
      \
    --checkpoint 
     
       \
    --gpu 
      

      
     
    
   

Infer from files of features on disk. feature_files and output_files can be lists of files to perform batch inference.

python -m cargan \
    --feature_files 
   
     \
    --output_files 
    
      \
    --checkpoint 
     
       \
    --gpu 
      

      
     
    
   

API

cargan.from_audio

"""Perform vocoding from audio

Arguments
    audio : torch.Tensor(shape=(1, samples))
        The audio to vocode
    sample_rate : int
        The audio sample rate
    gpu : int or None
        The index of the gpu to use

Returns
    vocoded : torch.Tensor(shape=(1, samples))
        The vocoded audio
"""

cargan.from_audio_file_to_file

"""Perform vocoding from audio file and save to file

Arguments
    audio_file : Path
        The audio file to vocode
    output_file : Path
        The location to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_audio_files_to_files

"""Perform vocoding from audio files and save to files

Arguments
    audio_files : list(Path)
        The audio files to vocode
    output_files : list(Path)
        The locations to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_features

"""Perform vocoding from features

Arguments
    features : torch.Tensor(shape=(1, cargan.NUM_FEATURES, frames)
        The features to vocode
    gpu : int or None
        The index of the gpu to use

Returns
    vocoded : torch.Tensor(shape=(1, cargan.HOPSIZE * frames))
        The vocoded audio
"""

cargan.from_feature_file_to_file

"""Perform vocoding from feature file and save to disk

Arguments
    feature_file : Path
        The feature file to vocode
    output_file : Path
        The location to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_feature_files_to_files

"""Perform vocoding from feature files and save to disk

Arguments
    feature_files : list(Path)
        The feature files to vocode
    output_files : list(Path)
        The locations to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

Reproducing results

For the following subsections, the arguments are as follows

  • checkpoint - Path to an existing checkpoint on disk
  • datasets - A list of datasets to use. Supported datasets are vctk, daps, cumsum, and musdb.
  • gpu - The index of the gpu to use
  • gpus - A list of indices of gpus to use for distributed data parallelism (DDP)
  • name - The name to give to an experiment or evaluation
  • num - The number of samples to evaluate

Download

Downloads, unzips, and formats datasets. Stores datasets in data/datasets/. Stores formatted datasets in data/cache/.

python -m cargan.data.download --datasets 
   

   

vctk must be downloaded before cumsum.

Preprocess

Prepares features for training. Features are stored in data/cache/.

python -m cargan.preprocess --datasets 
   
     --gpu 
    

    
   

Running this step is not required for the cumsum experiment.

Partition

Partitions a dataset into training, validation, and testing partitions. You should not need to run this, as the partitions used in our work are provided for each dataset in cargan/assets/partitions/.

python -m cargan.partition --datasets 
   

   

The optional --overwrite flag forces the existing partition to be overwritten.

Train

Trains a model. Checkpoints and logs are stored in runs/.

python -m cargan.train \
    --name 
   
     \
    --datasets 
    
      \
    --gpus 
     

     
    
   

You can optionally specify a --checkpoint option pointing to the directory of a previous run. The most recent checkpoint will automatically be loaded and training will resume from that checkpoint. You can overwrite a previous training by passing the --overwrite flag.

You can monitor training via tensorboard as follows.

tensorboard --logdir runs/ --port 
   

   

Evaluate

Objective

Reports the pitch RMSE (in cents), periodicity RMSE, and voiced/unvoiced F1 score. Results are both printed and stored in eval/objective/.

python -m cargan.evaluate.objective \
    --name 
   
     \
    --datasets 
    
      \
    --checkpoint 
     
       \
    --num 
      
        \
    --gpu 
        
       
      
     
    
   

Subjective

Generates samples for subjective evaluation. Also performs benchmarking of inference speed. Results are stored in eval/subjective/.

python -m cargan.evaluate.subjective \
    --name 
   
     \
    --datasets 
    
      \
    --checkpoint 
     
       \
    --num 
      
        \
    --gpu 
        
       
      
     
    
   

Receptive field

Get the size of the (non-causal) receptive field of the generator. cargan.AUTOREGRESSIVE must be False to use this.

python -m cargan.evaluate.receptive_field

Running tests

pip install pytest
pytest

Citation

IEEE

M. Morrison, R. Kumar, K. Kumar, P. Seetharaman, A. Courville, and Y. Bengio, "Chunked Autoregressive GAN for Conditional Waveform Synthesis," Submitted to ICLR 2022, April 2022.

BibTex

@inproceedings{morrison2022chunked,
    title={Chunked Autoregressive GAN for Conditional Waveform Synthesis},
    author={Morrison, Max and Kumar, Rithesh and Kumar, Kundan and Seetharaman, Prem and Courville, Aaron and Bengio, Yoshua},
    booktitle={Submitted to ICLR 2022},
    month={April},
    year={2022}
}
PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

A ready-to-use framework of latest models for structured (tabular) data learning with PyTorch. Applications include recommendation, CRT prediction, healthcare analytics, and etc.

48 Nov 30, 2022
RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining Our code is based on Learning Attention-based Embed

宋朝都 4 Aug 07, 2022
Naszilla is a Python library for neural architecture search (NAS)

A repository to compare many popular NAS algorithms seamlessly across three popular benchmarks (NASBench 101, 201, and 301). You can implement your ow

270 Jan 03, 2023
TransMorph: Transformer for Medical Image Registration

TransMorph: Transformer for Medical Image Registration keywords: Vision Transformer, Swin Transformer, convolutional neural networks, image registrati

Junyu Chen 180 Jan 07, 2023
Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

mqtt-camera-streamer Summary: Publish frames from a connected camera or MJPEG/RTSP stream to an MQTT topic, and view the feed in a browser on another

Robin Cole 183 Dec 16, 2022
A semantic segmentation toolbox based on PyTorch

Introduction vedaseg is an open source semantic segmentation toolbox based on PyTorch. Features Modular Design We decompose the semantic segmentation

407 Dec 15, 2022
OverFeat is a Convolutional Network-based image classifier and feature extractor.

OverFeat OverFeat is a Convolutional Network-based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participat

593 Dec 08, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

Ranjie Duan 52 Nov 10, 2022
A Gura parser implementation for Python

Gura Python parser This repository contains the implementation of a Gura (compliant with version 1.0.0) format parser in Python. Installation pip inst

Gura Config Lang 19 Jan 25, 2022
190 Jan 03, 2023
The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

DynamicNeuralGarments Introduction This repository contains the implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021. ./GarmentMoti

42 Dec 27, 2022
GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications

GPOEO GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications. We also implement ODPP [1] as a comparison. [1]

瑞雪轻飏 8 Sep 10, 2022
PyTorch implementation of MulMON

MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi

NanboLi 16 Nov 03, 2022
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 09, 2022
Automatic Image Background Subtraction

Automatic Image Background Subtraction This repo contains set of scripts for automatic one-shot image background subtraction task using the following

Oleg Sémery 6 Dec 05, 2022
View model summaries in PyTorch!

torchinfo (formerly torch-summary) Torchinfo provides information complementary to what is provided by print(your_model) in PyTorch, similar to Tensor

Tyler Yep 1.5k Jan 05, 2023
Deep Reinforcement Learning for Multiplayer Online Battle Arena

MOBA_RL Deep Reinforcement Learning for Multiplayer Online Battle Arena Prerequisite Python 3 gym-derk Tensorflow 2.4.1 Dotaservice of TimZaman Seed R

Dohyeong Kim 32 Dec 18, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023