Audio2midi - Automatic Audio-to-symbolic Arrangement

Related tags

Audioaudio2midi
Overview

Automatic Audio-to-symbolic Arrangement

This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning" by Ziyu Wang, Dejing Xu, Gus Xia and Ying Shan (arXiv:2112.15110).

Model Overview

We propose an audio-to-symbolic generative model to transfer an input audio to its piano arrangement (in MIDI format). The task is similar to piano cover song production based on the original pop songs, or piano score reduction based on a classical symphony.

The input audio is a pop-song audio under arbitrary instrumentation. The output is a MIDI piano arrangement of the original audio which keeps most musical information (incdluding chord, groove and lead melody). The model is built upon the previous projects including EC2-VAE, PianoTree VAE, and Poly-Dis. The novelty of this study is the cross-modal representation disentanglement design and the 3-step training strategy shown below.

workflow

Application

This audio-to-symbolic conversion and arrangement module can lead to many applications (e.g., music score generation, automatic acccompaniment etc.) In this project, we demonstrate a specific application: audio-level pop style transfer, in which we transfer a pop song "vocal + (full band) accompaniment" to "vocal + piano accompaniment". The automation is an integration of spleeter source separation, audio-to-symbolic conversion, performance rendering, and synthesis (as shown in the diagram below).

style_transfer_app

Quick Start

In demo/audio2midi_out we presents generated results in separated song folders. In each song folder, there is a MIDI file and an audio file.

  • The MIDI file is the generated piano arrangement by our model (with rule-based performance rendering).
  • The audio file is a combination of the generated piano arragement and the original vocal (separated by Spleeter) using GarageBand.

In this section, we demonstrate how to use our model to reproduce the demo. Later on, you can generate piano arrangement with your own inputs.

Step One: Prepare the python environment specified by requirements.txt.

Step Two: Download the pre-trained model parameter files and put them under params. (Link)

Step Three: Source separation the audio files at demo/example_audio using Spleeter and output the results at demo/spleeter_out (see the original repository for more detail). If you do not want to run Spleeter right now, we have prepare the seprated result for you at demo/spleeter_out.

Step Four: Run the following command at the project root directory:

python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example1/accompaniment.wav ./demo/audio2midi_out/example1
python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example2/accompaniment.wav ./demo/audio2midi_out/example2

Here, the first parameter after inference_pop_style_transfer.py is the input audio file path, and the second parameter is the output MIDI folder. The command also allows applying different models, switching autoregressive mode, and many other options. See more details by

python scripts/inference_pop_style_transfer.py -h

The generated MIDI files can be found at demo/audio2midi_out. Then, the MIDI arrangement and vocal audio can be combined and further processed in the DAW.

Project Details

Before discussing the audio-to-midi conversion in more detail, we introduce different models and training stages that can be specified in the command.

Model Types

There are four model types (or model_id):

  • (a2s): the proposed model (requiring chord extraction because it learns disentangled latent chord representation).
  • (a2s-nochd): the proposed model without chord latent representation (i.e., removing chord encoder or chord decoder).
  • (supervised): pure supervised training (audio encoder + symbolic decoder (without symbolic feature decoding)). No resampling and KL penalty.
  • (supervised+kl): model architecture is same as supervised. Resampling and KL penalty is applied (to audio-texture representation).

In general, a2s and a2s-nochd perform the best as they are the proposed models. The rest are baseline models in the paper. (Note: the other baseline model (Chord-based generation) is not implemented because it does not depend on any audio information.)

Stages (Autoregressive Mode)

As shown in the diagram above, a2s and a2s-nochd have 3 training stages, where stage 1 and 2 are not ready to be used, and stage 3 has two options (i.e., pure audio-based and autoregression).

  • (Stage 3a): pure audio-based. The model input is the audio itself.
  • (Stage 3b): autoregression. The model input is the audio itself and previous 2-measure output.

During inference, autoregressive=True mode means we use a stage 3b model auto regressively, and use stage 3a model to compute the initial segment. On the other hand, autoregressive=False model means we use a stage 3a model to arrangement every 2-measure segment independently. In general, autoregressive=False gives output of slightly higher quality.

Audio2midi Command

Run audio2midi in command line using the command:

python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path>

Note: the command should be run at the project root directory. For clarity, the user can only specify the output MIDI folder (where possibly multiple versions of MIDI arrangement will be produced in the same folder). To specify output MIDI filename, see the python interfance section below.

The command has the following functions:

  1. To specify model_id using -m or --model (default to be a2s), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model a2s-nochd
    
  2. To turn on autoregressive mode using -a or --auto (detault to be autoregressive=False), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --auto
    
  3. By default, the beat and chord analysis (a preprocessing step by madmom) will be applied to the separated accompaniment track. The analysis can also be applied to the original audio using -o or --original, (though there are usually no aparent difference), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --original <input_audio_path>
    
  4. At the early stage of this research, we only have stage 3a and stage 3b has not been developed yet. We provide the model parameter of the old stage 3a model and name it after the alternative model. The generation results also sound good in general. The alternative model can be called by --alt, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --alt
    

    Alternative model can be also called in autoregressive mode, where the model for initial computation will be replaced with the alternation.

  5. To run through all the model types, autoregressive modes and alternative models, using --model all, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model all
    

For more detail, check it out using

python scripts/inference_pop_style_transfer.py -h

Audio2midi Python Interface

To use the model in a more flexible fashion (e.g., use your own trained model parameter, combine the a2s routine in your project, or run multiple files at a time), we provide the python interface. Specifically, call the functions in the pop_style_transfer package. For example, in a python script outside this project, we can write the following python code:

from audio2midi.pop_style_transfer import acc_audio_to_midi, load_model

model, model0 = load_model(model_id='a2s', 
                           autoregressive=False,
                           alt=False
                          )

acc_audio_to_midi(input_acc_audio_path=...,
                  output_midi_path=...,
                  model=model, 
                  model0=model0,
                  input_audio_path=..., 
                  input_analysis_npy_path=...,
                  save_analysis_npy_path=..., 
                  batch_size=...
                 )

Note: Unlike in the command line mode where output location can only be a folder directory, we can specify filename in the python interface. Also, chord and beat preprocessing can be loaded or stored, and batch size can be chaged.

Data

We use the POP909 dataset and its time-aligned audio files. The audio files are colloected from the internet. The data should be processed and put under the data folder. Specifically, there are 4 sub-folders:

  • data/audio-midi-combined: contain 909 folders of 909 songs. In ecah folder, we have
    • audio.mp3 the original audio
    • midi folder containing the MIDI arrangement and annotation files.
    • split audio folder containing the vocal-accompaniment separation result using spleeter:2stems model.
  • data/quantized_pop909_4_bin: containing 909 npz files storing pre-processed symbolic data from MIDI and annotation files.
  • data/audio_stretched: containing 909 pairs of wav and pickle files storing the accompaniment audio time-stretched to 13312 frames per (symbolic) beat under 22050 sample rate (approximately 99.384 bpm).
  • data/train_split: containing a pickle file for train and valid split ids at song level.

The pre-processed symbolic-part of the dataset (i.e., quantized_pop909_4_bin and train_split can be downloaded at link.

Training the model

To train the model, run the scripts in ./scripts/run*.py, at project root directory. The result will be saved in the result folder. The model should be trained in a sequential manner. For example, run

python ./scripts/run_a2s_stage1.py

When the stage i model is trained, put the model parameter file path in line 13 of /scripts/run_a2s_stage{j}.py, and run

python ./scripts/run_a2s_stage{j}.py

Here, (i, j) can be (i=1, j=2), (i=2, j=3), (i=2, j=4). (3 for stage 3a and 4 for stage 3b). The other model types can be run in a similar fashion. The training result can be downloaded at link.

Summary of Materials to Download

References

Owner
Ziyu Wang
Computer music, Deep Music Generation & Artificial musical intelligence
Ziyu Wang
Carnatic Notes Predictor for audio files

Carnatic Notes Predictor for audio files Link for live application: https://share.streamlit.io/pradeepak1/carnatic-notes-predictor-for-audio-files/mai

1 Nov 06, 2021
IDing the songs played on the do you radio show

IDing the songs played on the do you radio show

Rasmus Jones 36 Nov 15, 2022
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 20 Jun 11, 2021
A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

2 Dec 15, 2021
Python CD-DA ripper preferring accuracy over speed

Whipper Whipper is a Python 3 (3.6+) CD-DA ripper based on the morituri project (CDDA ripper for *nix systems aiming for accuracy over speed). It star

671 Jan 04, 2023
Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose We provide PyTorch implementations for our arxiv paper "Audio-dr

Ran Yi 497 Jan 09, 2023
Oliva music bot help to play vc music

OLIVA V2 🎵 Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.7+ PyTgCalls Commands 🛠 For all in group /play - reply to youtube url or song file

SOUL々H҉A҉C҉K҉E҉R҉ 2 Oct 22, 2021
An audio-solving python funcaptcha solving module

funcapsolver funcapsolver is a funcaptcha audio-solving module, which allows captchas to be interacted with and solved with the use of google's speech

Acier 8 Nov 21, 2022
Python game programming in Jupyter notebooks.

Jupylet Jupylet is a Python library for programming 2D and 3D games, graphics, music and sound synthesizers, interactively in a Jupyter notebook. It i

Nir Aides 178 Dec 09, 2022
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
A2DP agent for promiscuous/permissive audio sinc.

Promiscuous Bluetooth audio sinc A2DP agent for promiscuous/permissive audio sinc for Linux. Once installed, a Bluetooth client, such as a smart phone

Jasper Aorangi 4 May 27, 2022
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
Vixtify - Python Controlled Music Player

Strumm Sound Playlist : Click me to listen Welcome to GitHub Pages You can use the editor on GitHub to maintain and preview the content for your websi

Vicky Kumar 2 Feb 03, 2022
A tool for retrieving audio in the past

Rewinder A tool for retrieving audio in the past. Ever felt like, I need to remember that discussion which happened 10 min back. Now you can! Rewind a

Bharat 1 Jan 24, 2022
TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

TONet Introduction The official implementation of "TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music", in ICASSP 2022 We

Knut(Ke) Chen 29 Dec 01, 2022
The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

Shankar Mahadevan L 1 Dec 03, 2021
Generating a structured library of .wav samples with Python.

sample-library Scripts for generating a structured sample library with Python Requires Docker about Samples are written to wave files in lib/. Differe

Ben Mangold 1 Nov 11, 2021
🎵 A music bot for discord servers!

music bot A music bot for Discord Servers Features Play songs in your discord server Get the lyrics without going on a web explorer Commands Command P

1 Jul 25, 2022