Audio augmentations library for PyTorch for audio in the time-domain

Last update: Jan 08, 2023

Related tags

Overview

Audio Augmentations

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1

> transformed_audio.shape[0] = 4 ">

audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Comments

Delay augmentation on cuda

Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

opened by sidml 2
Correctness unit test would be great

For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

opened by keunwoochoi 2
Default value of `max_snr` in `Noise`

1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

Related; it'd be great if one can hear some examples of the augmented result.

opened by keunwoochoi 2
End-to-end PitchShift transform tests

This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

opened by Spijkervet 0
Unittests

This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

opened by Spijkervet 0
import error

when i import torchaudio_augmentation

I got the error

RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

how can I deal with it?

opened by EavnJeong 0
Snr db
Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.

When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

I hope it makes sens. I'd be glad to discuss further about that.
opened by wesbz 0
sanity check for duration

In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

opened by keunwoochoi 0
Shapes are still a bit confusing

From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

opened by keunwoochoi 3

Releases(v0.2.3)

v0.2.3(Nov 15, 2021)
This version:

Removes WavAugment's pitch detection, and instead relies on the parallelisable torch-pitch-shift package.

Adds support for batched inputs (batch, channel, time).

It also adds more tests for every audio transformation.

Source code(tar.gz)
Source code(zip)
0.2.0(Jun 29, 2021)

Removes the Essentia library as a dependency and adds compatibility for multi-channel audio.
Source code(tar.gz)
Source code(zip)
1.0(May 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Janne

Music producer, machine learning in MIR & occasional ethical hacker

GitHub Repository

A Simple Script that will help you to Play / Change Songs with just your Voice

Auto-Spotify using Voice Recognition A Simple Script that will help you to Play / Change Songs with just your Voice Explore the docs » Table of Conten

1 Nov 21, 2021

Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

MediumVC MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utter

47 Dec 25, 2022

Implicit neural differentiable FM synthesizer

Implicit neural differentiable FM synthesizer The purpose of this project is to emulate arbitrary sounds with FM synthesis, where the parameters of th

34 Nov 06, 2022

MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

3k Dec 31, 2022

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence La

1.4k Jan 01, 2023

Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

72 Dec 23, 2022

pedalboard is a Python library for adding effects to audio.

pedalboard is a Python library for adding effects to audio. It supports a number of common audio effects out of the box, and also allows the use of VST3® and Audio Unit plugin formats for third-party

3.9k Jan 02, 2023

Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.

LPC_for_TTS Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm. 基于Levinson-Durbin

58 Nov 17, 2022

Xbot-Music - Bot Play Music and Video in Voice Chat Group Telegram

XBOT-MUSIC A Telegram Music+video Bot written in Python using Pyrogram and Py-Tg

2 Jan 20, 2022

Audio features extraction

Yaafe Yet Another Audio Feature Extractor Build status Branch master : Branch dev : Anaconda : Install Conda Yaafe can be easily install with conda. T

231 Dec 26, 2022

Synthesia but open source, made in python and free

PyPiano Synthesia but open source, made in python and free Requirements are in requirements.txt If you struggle with installation of pyaudio, run : pi

11 Nov 06, 2022

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

577 Dec 26, 2022

All-In-One Digital Audio Workstation and Plugin Suite

How to install Windows Mac OS X Fedora Ubuntu How to Build Debian and Ubuntu Fedora All Other Linux Distros Mac OS X Windows What is MusiKernel? MusiK

111 Sep 21, 2021

An AI for Music Generation

1.3k Dec 31, 2022

A Python port and library-fication of the midicsv tool by John Walker.

A Python port and library-fication of the midicsv tool by John Walker. If you need to convert MIDI files to human-readable text files and back, this is the library for you.

52 Dec 29, 2022

Pianote - An application that helps musicians practice piano ear training

Pianote Pianote is an application that helps musicians practice piano ear traini

3 Aug 17, 2022

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully

58 Jan 02, 2023

PyAbsorp is a python module that has the main focus to help estimate the Sound Absorption Coefficient.

This is a package developed to be use to find the Sound Absorption Coefficient through some implemented models, like Biot-Allard, Johnson-Champoux and

8 Oct 19, 2022

Python implementation of the Short Term Objective Intelligibility measure

Python implementation of STOI Implementation of the classical and extended Short Term Objective Intelligibility measures Intelligibility measure which

250 Dec 21, 2022

Algorithmic Multi-Instrumental MIDI Continuation Implementation

Matchmaker Algorithmic Multi-Instrumental MIDI Continuation Implementation Taming large-scale MIDI datasets with algorithms This is a WIP so please ch

2 Mar 11, 2022