Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Last update: Dec 05, 2022

Overview

Text2Music Emotion Embedding

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Reference

Emotion Embedding Spaces for Matching Music to Stories, ISMIR 2021 [paper]

-- Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Requirements

conda create -n YOUR_ENV_NAME python=3.7
conda activate YOUR_ENV_NAME
pip install -r requirements.txt

Data

You need to collect audio files of AudioSet mood subset (link).
Read the audio files and store them into .npy format.
Other relevant data including Alm's dataset (original link), ISEAR dataset (original link), emotion embeddings, pretrained Word2Vec, and data splits are all available here (link).
Unzip ttm_data.tar.gz and locate the extracted data folder under text2music-emotion-embedding/.

Training

Here is an example for training a metric learning model.

python3 src/metric_learning/main.py \
        --dataset 'isear' \
        --num_branches 3 \
        --data_path YOUR_DATA_PATH_TO_AUDIOSET

Fore more examples, check bash files under scripts folder.

Test

Here is an example for the test.

python3 src/metric_learning/main.py \
        --mode 'TEST' \
        --dataset 'alm' \
        --model_load_path 'data/pretrained/alm_cross.ckpt' \
        --data_path 'YOUR_DATA_PATH_TO_AUDIOSET'

Pretrained three-branch metric learning models (alm_cross.ckpt and isear_cross.ckpt) are included in ttm_data.tar.gz. This code is reproducible by locating the unzipped data folder under text2music-emotion-embedding/.

Visualization

Embedding distribution of each model can be projected onto 2-dimensional space. We used uniform manifold approximation and projection (UMAP) to visualize the distribution. UMAP is known to preserve more of global structure compared to t-SNE.

Demo

Please try some examples done by the three-branch metric learning model [Soundcloud].

License

Some License

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Related tags

Overview

Text2Music Emotion Embedding

Reference

Requirements

Data

Training

Test

Visualization

Demo

License

Owner

Minz Won

Pytorch tutorials for Neural Style transfert

An Api for Emotion recognition.

CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

MvtecAD unsupervised Anomaly Detection

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

Disentangled Lifespan Face Synthesis

SNIPS: Solving Noisy Inverse Problems Stochastically

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

InterfaceGAN++: Exploring the limits of InterfaceGAN

A crossplatform menu bar application using mpv as DLNA Media Renderer.

Make Watson Assistant send messages to your Discord Server

YOLOX + ROS(1, 2) object detection package

COVID-Net Open Source Initiative

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

The official repository for Deep Image Matting with Flexible Guidance Input