music-ai

Deep learning transformer model that generates unique music sequences.

Abstract

In 2017, a new state-of-the-art was published for natural language processing: the Transformer. Relying solely on attention mechanisms, the Transformer outperformed existing solutions based on recurrent and convolutional neural networks¹. However, recurrent neural networks, long short-term memory, and gated recurrent neural networks remain dominant in the field of generative music. I aim to introduce the Transformer into the field of music, with the goal of teaching the deep learning model to predict the second half of a composition given the first half. A Transformer equipped with 32 attention heads and sinusoidal positional encoding was trained on the Nottingham MIDI dataset for 5000 epochs over a period of 48 hours, optimized by stochastic gradient descent and measured with cross entropy loss, and regulated by an exponential learning rate decrease schedule. For the first thousand epochs, the model had noticeable improvement but lacked arrangement to the generated sequences. By five thousand epochs, the model clearly demonstrated the knowledge of general music trends used to better predict how classical composers write their pieces, and most tracks were melodic to the human ear. Future applications of this technique include generating tracks for various instruments, rating the quality of existing music tracks, and complete originality if combined with a generative network mapping melodies to latent space.

¹ Attention Is All You Need

Video

Hardware

Ubuntu

32 GB RAM
Intel Core i3-4170 CPU @3.70 GHz x4 (4 GB RAM)
NVIDIA GeForce GTX 1050 Ti

Deep learning transformer model that generates unique music sequences.

Related tags

Overview

music-ai

Abstract

Video

Hardware

Owner

xacer

Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

A music player designed for a University Project.

This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern

python wrapper for rubberband

Conferencing Speech Challenge

Dataset and baseline code for the VocalSound dataset (ICASSP2022).

Sequencer: Deep LSTM for Image Classification

A Python library and tools AUCTUS A6 based radios.

Manipulate audio with a simple and easy high level interface

A voice control utility for Spotify

gentle forced aligner

A simple music player, powered by Python, utilising various libraries such as Tkinter and Pygame

SomaFM Plugin for Kodi

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

GNOME powered sound conversion

Python I/O for STEM audio files

Implicit neural differentiable FM synthesizer

Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

Open Sound Strip, Sequence or Record in Audacity

Python interface to the WebRTC Voice Activity Detector