MelGAN test on audio decoding

Related tags

AudioPaper_12
Overview

Official repository for the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

The original work URL: https://github.com/descriptinc/melgan-neurips

Previous works have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks. Blog post with samples and accompanying code coming soon.

Visit our website for samples. You can try the speech correction application here created based on the end-to-end speech synthesis pipeline using MelGAN.

Check the slides if you aren't attending the NeurIPS 2019 conference to check out our poster.

Code organization

├── README.md             <- Top-level README.
├── set_env.sh            <- Set PYTHONPATH and CUDA_VISIBLE_DEVICES.
│
├── mel2wav
│   ├── dataset.py           <- data loader scripts
│   ├── modules.py           <- Model, layers and losses
│   ├── utils.py             <- Utilities to monitor, save, log, schedule etc.
│
├── scripts
│   ├── train.py                    <- training / validation / etc scripts
│   ├── generate_from_folder.py

Preparing dataset

Create a raw folder with all the samples stored in wavs/ subfolder. Run these commands:

ls wavs/*.wav | tail -n+10 > train_files.txt
ls wavs/*.wav | head -n10 > test_files.txt

Training Example

. source set_env.sh 0
# Set PYTHONPATH and use first GPU
python scripts/train.py --save_path logs/baseline --path <root_data_folder>

PyTorch Hub Example

import torch
vocoder = torch.hub.load('descriptinc/melgan-neurips', 'load_melgan')
vocoder.inverse(audio)  # audio (torch.tensor) -> (batch_size, 80, timesteps)
Owner
Jurio
Jurio
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Audio2midi - Automatic Audio-to-symbolic Arrangement

Automatic Audio-to-symbolic Arrangement This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Lear

Ziyu Wang 24 Dec 05, 2022
Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

PyAV 1.8k Jan 03, 2023
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 01, 2022
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

ADAT USB Audio Interface FPGA based USB 2.0 High Speed audio interface featuring multiple optical ADAT inputs and outputs Status / current limitations

Hans Baier 78 Dec 31, 2022
This is my voice assistant Patric!

voice-assistant This is my voice assistant Patric! You can add can add commands and even modify his name Indice How to use Installation guide How to u

Norbert Gabos 1 Jun 28, 2022
A Youtube audio player for your terminal

AudioLine A lightweight Youtube audio player for your terminal Explore the docs » View Demo · Report Bug · Request Feature · Send a Pull Request About

Haseeb Khalid 26 Jan 04, 2023
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.1k Dec 31, 2022
Dataset and baseline code for the VocalSound dataset (ICASSP2022).

VocalSound: A Dataset for Improving Human Vocal Sounds Recognition Introduction Citing Download VocalSound Dataset Details Baseline Experiment Contact

Yuan Gong 58 Jan 03, 2023
Suyash More 111 Jan 07, 2023
XA Music Player - Telegram Music Bot

XA Music Player Requirements 📝 FFmpeg (Latest) NodeJS nodesource.com (NodeJS 17+) Python (3.10+) PyTgCalls (Lastest) MongoDB (3.12.1) 2nd Telegram Ac

RexAshh 3 Jun 30, 2022
Nayeli: cool telegram groups vc music project

Nayeli-music Nayeli 🥀 is cool telegram 🍎 groups vc music project 🎋 . Nayeli-music Nayeli Deployment 🎋 📲 Esy deploy 🐾️ Source Owner ♥️ ❄️ He is s

Kasun bandara 2 Dec 20, 2021
controls volume using hand gestures

controls volume using hand gestures

1 Oct 11, 2021
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
Enhanced Audio Player for Discord

Discodo is an enhanced audio player for discord

Mary 42 Oct 05, 2022
A GUI-based audio player with support for a large variety of formats

Miza-Player A GUI-based audio player with support for a large variety of formats, able to play from web-hosted media platforms such as YouTube, includ

Thomas Xin 3 Dec 14, 2022