Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

Overview

MidiBERT-Piano


MIT License ARXIV LICENSE STAR ISSUE

Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen

Introduction

This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

With this repository, you can

  • pre-train a MidiBERT-Piano with your customized pre-trained dataset
  • fine-tune & evaluate on 4 downstream tasks
  • compare its performance with a Bi-LSTM

All the datasets employed in this work are publicly available.

Quick Start

If you'd like to reproduce the results (MidiBERT) shown in the paper, image-20210710185007453

  1. please download the checkpoints, and rename files like the following
MidiBERT/{CP/remi}/
result
└── finetune
	└── melody_default
		└── model_best.ckpt
	└── velocity_default
		└── model_best.ckpt
	└── composer_default
		└── model_best.ckpt
	└── emotion_default
		└── model_best.ckpt
  1. please refer to evaluation,

and you are free to go! (btw, no gpu is needed for evaluation)

Installation

  • Python3
  • Install generally used packages for MidiBERT-Piano:
git clone https://github.com/wazenmai/MIDI-BERT.git
cd MIDI-BERT
pip install -r requirements.txt

A. Prepare Data

All data in CP/REMI token are stored in data/CP & data/remi, respectively, including the train, valid, test split.

You can also preprocess as below.

1. download dataset and preprocess

  • Pop1K7
  • ASAP
    • Step 1: Download ASAP dataset from the link
    • Step 2: Use Dataset/ASAP_song.pkl to extract songs to Dataset/ASAP
  • POP909
    • preprocess to have 865 pieces in qualified 4/4 time signature
    • exploratory.py to get pieces qualified in 4/4 time signature and save at qual_pieces.pkl
    • preprocess.py to realign and preprocess
    • Special thanks to Shih-Lun (Sean) Wu
  • Pianist8
    • Step 1: Download Pianist8 dataset from the link
    • Step 2: Use Dataset/pianist8_(mode).pkl to extracts songs to Dataset/pianist8/mode
  • EMOPIA
    • Step 1: Download Emopia dataset from the link
    • Step 2: Use Dataset/emopia_(mode).pkl to extracts songs to Dataset/emopia/mode

2. prepare dict

dict/make_dict.py customize the events & words you'd like to add.

In this paper, we only use Bar, Position, Pitch, Duration. And we provide our dictionaries in CP & REMI representation.

dict/CP.pkl

dict/remi.pkl

3. prepare CP & REMI

./prepare_data/CP

  • Run python3 main.py . Please specify the dataset and whether you wanna prepare an answer array for the task (i.e. melody extraction, velocity prediction, composer classification and emotion classification).
  • For example, python3 main.py --dataset=pop909 --task=melody --dir=[DIR_TO_STORE_DATA]

./prepare_data/remi/

  • The same logic applies to preparing REMI data.

Acknowledgement: CP repo, remi repo

You may encode these midi files in different representations, the data split is in ***.

B. Pre-train a MidiBERT-Piano

./MidiBERT/CP and ./MidiBERT/remi

  • pre-train a MidiBERT-Piano
python3 main.py --name=default

A folder named CP_result/pretrain/default/ will be created, with checkpoint & log inside.

  • customize your own pre-training dataset Feel free to select given dataset and add your own dataset. To do this, add --dataset, and specify the respective path in load_data() function. For example,
# to pre-train a model with only 2 datasets
python3 main.py --name=default --dataset pop1k7 asap	

Acknowledgement: HuggingFace

Special thanks to Chin-Jui Chang

C. Fine-tune & Evaluate on Downstream Tasks

./MidiBERT/CP and ./MidiBERT/remi

1. fine-tuning

  • finetune.py
python3 finetune.py --task=melody --name=default

A folder named CP_result/finetune/{name}/ will be created, with checkpoint & log inside.

2. evaluation

  • eval.py
python3 eval.py --task=melody --cpu --ckpt=[ckpt_path]

Test loss & accuracy will be printed, and a figure of confusion matrix will be saved.

The same logic applies to REMI representation.

D. Baseline Model (Bi-LSTM)

./baseline/CP & ./baseline/remi

We seperate our baseline model to note-level tasks, which used a Bi-LSTM, and sequence-level tasks, which used a Bi-LSTM + Self-attention model.

For evaluation, in note-level task, please specify the checkpoint name. In sequence-level task, please specify only the output name you set when you trained.

  • Train a Bi-LSTM

    • note-level task
     python3 main.py --task=melody --name=0710
    • sequence-level task
     python3 main.py --task=composer --output=0710
  • Evaluate

    • note-level task:
     python3 eval.py --task=melody --ckpt=result/melody-LSTM/0710/LSTM-melody-classification.pth
    • sequence-level task
     python3 eval.py --task='composer' --ckpt=0710

The same logic applies to REMI representation.

Special thanks to Ching-Yu (Sunny) Chiu

E. Skyline

Get the accuracy on pop909 using skyline algorithm

python3 cal_acc.py

Since Pop909 contains melody, bridge, accompaniment, yet skyline cannot distinguish between melody and bridge.

There are 2 ways to report its accuracy:

  1. Consider Bridge as Accompaniment, attains 78.54% accuracy
  2. Consider Bridge as Melody, attains 79.51%

Special thanks to Wen-Yi Hsiao for providing the code for skyline algorithm.

Citation

If you find this useful, please cite our paper.

@article{midibertpiano,
  title={{MidiBERT-Piano}: Large-scale Pre-training for Symbolic Music Understanding},
  author={Yi-Hui Chou and I-Chun Chen and Chin-Jui Chang and Joann Ching, and Yi-Hsuan Yang},
  journal={arXiv preprint arXiv:2107.05223},
  year={2021}
}
VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

Wei ZHANG 66 Dec 09, 2022
Nicholas Lee 3 Jan 09, 2022
Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

Victor Delvigne 21 Dec 23, 2022
A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

GAM ⠀⠀ A PyTorch implementation of Graph Classification Using Structural Attention (KDD 2018). Abstract Graph classification is a problem with practic

Benedek Rozemberczki 259 Dec 05, 2022
TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

TCube: Domain-Agnostic Neural Time series Narration This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narrat

Mandar Sharma 7 Oct 31, 2021
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Advanced Image Manipulation Lab @ Samsung AI Center Moscow 4.7k Dec 31, 2022
Official repository for Fourier model that can generate periodic signals

Conditional Generation of Periodic Signals with Fourier-Based Decoder Jiyoung Lee, Wonjae Kim, Daehoon Gwak, Edward Choi This repository provides offi

8 May 25, 2022
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

Firas Laakom 5 Jul 08, 2022
Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022
PyTorch implementation of PP-LCNet: A Lightweight CPU Convolutional Neural Network

PyTorch implementation of PP-LCNet Reproduction of PP-LCNet architecture as described in PP-LCNet: A Lightweight CPU Convolutional Neural Network by C

Quan Nguyen (Fly) 47 Nov 02, 2022
Codebase of deep learning models for inferring stability of mRNA molecules

Kaggle OpenVaccine Models Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challen

Eternagame 40 Dec 29, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
PyTorch implementation of the YOLO (You Only Look Once) v2

PyTorch implementation of the YOLO (You Only Look Once) v2 The YOLOv2 is one of the most popular one-stage object detector. This project adopts PyTorc

申瑞珉 (Ruimin Shen) 433 Nov 24, 2022
Housing Price Prediction

This project aim was to predict the price of houses in the Boston area during the great financial crisis through regression, as well as classify houses into different quality categories according to

Florian Klement 1 Jan 27, 2022
OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

Wenbin Lin 193 Dec 15, 2022
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor

John 15 Dec 06, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022
[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

Hui-Po Wang 46 Sep 05, 2022
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 02, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022