Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Overview

Chunked Autoregressive GAN (CARGAN)

PyPI License Downloads

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [companion website]

Table of contents

Installation

pip install cargan

Configuration

All configuration is performed in cargan/constants.py. The default configuration is CARGAN. Additional configuration files for experiments described in our paper can be found in config/.

Inference

CLI

Infer from an audio files on disk. audio_files and output_files can be lists of files to perform batch inference.

python -m cargan \
    --audio_files 
   
     \
    --output_files 
    
      \
    --checkpoint 
     
       \
    --gpu 
      

      
     
    
   

Infer from files of features on disk. feature_files and output_files can be lists of files to perform batch inference.

python -m cargan \
    --feature_files 
   
     \
    --output_files 
    
      \
    --checkpoint 
     
       \
    --gpu 
      

      
     
    
   

API

cargan.from_audio

"""Perform vocoding from audio

Arguments
    audio : torch.Tensor(shape=(1, samples))
        The audio to vocode
    sample_rate : int
        The audio sample rate
    gpu : int or None
        The index of the gpu to use

Returns
    vocoded : torch.Tensor(shape=(1, samples))
        The vocoded audio
"""

cargan.from_audio_file_to_file

"""Perform vocoding from audio file and save to file

Arguments
    audio_file : Path
        The audio file to vocode
    output_file : Path
        The location to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_audio_files_to_files

"""Perform vocoding from audio files and save to files

Arguments
    audio_files : list(Path)
        The audio files to vocode
    output_files : list(Path)
        The locations to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_features

"""Perform vocoding from features

Arguments
    features : torch.Tensor(shape=(1, cargan.NUM_FEATURES, frames)
        The features to vocode
    gpu : int or None
        The index of the gpu to use

Returns
    vocoded : torch.Tensor(shape=(1, cargan.HOPSIZE * frames))
        The vocoded audio
"""

cargan.from_feature_file_to_file

"""Perform vocoding from feature file and save to disk

Arguments
    feature_file : Path
        The feature file to vocode
    output_file : Path
        The location to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

cargan.from_feature_files_to_files

"""Perform vocoding from feature files and save to disk

Arguments
    feature_files : list(Path)
        The feature files to vocode
    output_files : list(Path)
        The locations to save the vocoded audio
    checkpoint : Path
        The generator checkpoint
    gpu : int or None
        The index of the gpu to use
"""

Reproducing results

For the following subsections, the arguments are as follows

  • checkpoint - Path to an existing checkpoint on disk
  • datasets - A list of datasets to use. Supported datasets are vctk, daps, cumsum, and musdb.
  • gpu - The index of the gpu to use
  • gpus - A list of indices of gpus to use for distributed data parallelism (DDP)
  • name - The name to give to an experiment or evaluation
  • num - The number of samples to evaluate

Download

Downloads, unzips, and formats datasets. Stores datasets in data/datasets/. Stores formatted datasets in data/cache/.

python -m cargan.data.download --datasets 
   

   

vctk must be downloaded before cumsum.

Preprocess

Prepares features for training. Features are stored in data/cache/.

python -m cargan.preprocess --datasets 
   
     --gpu 
    

    
   

Running this step is not required for the cumsum experiment.

Partition

Partitions a dataset into training, validation, and testing partitions. You should not need to run this, as the partitions used in our work are provided for each dataset in cargan/assets/partitions/.

python -m cargan.partition --datasets 
   

   

The optional --overwrite flag forces the existing partition to be overwritten.

Train

Trains a model. Checkpoints and logs are stored in runs/.

python -m cargan.train \
    --name 
   
     \
    --datasets 
    
      \
    --gpus 
     

     
    
   

You can optionally specify a --checkpoint option pointing to the directory of a previous run. The most recent checkpoint will automatically be loaded and training will resume from that checkpoint. You can overwrite a previous training by passing the --overwrite flag.

You can monitor training via tensorboard as follows.

tensorboard --logdir runs/ --port 
   

   

Evaluate

Objective

Reports the pitch RMSE (in cents), periodicity RMSE, and voiced/unvoiced F1 score. Results are both printed and stored in eval/objective/.

python -m cargan.evaluate.objective \
    --name 
   
     \
    --datasets 
    
      \
    --checkpoint 
     
       \
    --num 
      
        \
    --gpu 
        
       
      
     
    
   

Subjective

Generates samples for subjective evaluation. Also performs benchmarking of inference speed. Results are stored in eval/subjective/.

python -m cargan.evaluate.subjective \
    --name 
   
     \
    --datasets 
    
      \
    --checkpoint 
     
       \
    --num 
      
        \
    --gpu 
        
       
      
     
    
   

Receptive field

Get the size of the (non-causal) receptive field of the generator. cargan.AUTOREGRESSIVE must be False to use this.

python -m cargan.evaluate.receptive_field

Running tests

pip install pytest
pytest

Citation

IEEE

M. Morrison, R. Kumar, K. Kumar, P. Seetharaman, A. Courville, and Y. Bengio, "Chunked Autoregressive GAN for Conditional Waveform Synthesis," Submitted to ICLR 2022, April 2022.

BibTex

@inproceedings{morrison2022chunked,
    title={Chunked Autoregressive GAN for Conditional Waveform Synthesis},
    author={Morrison, Max and Kumar, Rithesh and Kumar, Kundan and Seetharaman, Prem and Courville, Aaron and Bengio, Yoshua},
    booktitle={Submitted to ICLR 2022},
    month={April},
    year={2022}
}
This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PFD:Pose-guided Feature Disentangling for Occluded Person Re-identification based on Transformer This repo is the official implementation of "Pose-gui

Tao Wang 93 Dec 18, 2022
An implementation of "Learning human behaviors from motion capture by adversarial imitation"

Merel-MoCap-GAIL An implementation of Merel et al.'s paper on generative adversarial imitation learning (GAIL) using motion capture (MoCap) data: Lear

Yu-Wei Chao 34 Nov 12, 2022
PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

Jingyun Liang 837 Jan 09, 2023
StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 Webtoon / Anime Style Toonify Korea Webtoon or Japanese Anime Character Stylegan2 base high Quality 1024x1024 / 512x512 Generate and Transfe

121 Dec 21, 2022
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
The Simplest DCGAN Implementation

DCGAN in TensorLayer This is the TensorLayer implementation of Deep Convolutional Generative Adversarial Networks. Looking for Text to Image Synthesis

TensorLayer Community 310 Dec 13, 2022
Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2021/11/19 Thank you for your interest in our work. We have uploaded the code of our MTUNet to help peers conduct further research on i

dotman 92 Dec 25, 2022
Reproduction process of AlexNet

PaddlePaddle论文复现杂谈 背景 注:该repo基于PaddlePaddle,对AlexNet进行复现。时间仓促,难免有所疏漏,如果问题或者想法,欢迎随时提issue一块交流。 飞桨论文复现赛地址:https://aistudio.baidu.com/aistudio/competitio

19 Nov 29, 2022
Unsupervised Video Interpolation using Cycle Consistency

Unsupervised Video Interpolation using Cycle Consistency Project | Paper | YouTube Unsupervised Video Interpolation using Cycle Consistency Fitsum A.

NVIDIA Corporation 100 Nov 30, 2022
A copy of Ares that costs 30 fucking dollars.

Finalement, j'ai décidé d'abandonner cette idée, je me suis comporté comme un enfant qui été en colère. Comme m'ont dit certaines personnes j'ai des c

Bleu 24 Apr 14, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
The Official Implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose [NIPS 2021].

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The offical PyTorch implementation of Neural View Sy

Angtian Wang 20 Oct 09, 2022
A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

Xinyu Yi 261 Dec 31, 2022
tree-math: mathematical operations for JAX pytrees

tree-math: mathematical operations for JAX pytrees tree-math makes it easy to implement numerical algorithms that work on JAX pytrees, such as iterati

Google 137 Dec 28, 2022
The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

CSGStumpNet The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing Paper | Project page

Daxuan 39 Dec 26, 2022
Source code for Task-Aware Variational Adversarial Active Learning

Contrastive Coding for Active Learning under Class Distribution Mismatch Official PyTorch implementation of ["Contrastive Coding for Active Learning u

27 Nov 23, 2022
Deep Learning to Create StepMania SM FIles

StepCOVNet Running Audio to SM File Generator Currently only produces .txt files. Use SMDataTools to convert .txt to .sm python stepmania_note_generat

Chimezie Iwuanyanwu 8 Jan 08, 2023
Code for Transformer Hawkes Process, ICML 2020.

Transformer Hawkes Process Source code for Transformer Hawkes Process (ICML 2020). Run the code Dependencies Python 3.7. Anaconda contains all the req

Simiao Zuo 111 Dec 26, 2022
Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Secure Tar Secure Tarfile library It's a streaming wrapper around python tarfile

Pascal Vizeli 2 Dec 09, 2022