The training code for the 4th place model at MDX 2021 leaderboard A.

Last update: Dec 18, 2022

Overview

This repository contains the training code of our winning model at Music Demixing Challenge 2021, which got the 4th place on leaderboard A (6th in overall), and help us (Kazane Ryo no Danna) winned the bronze prize.

Model Summary

Our final winning approach blends the outputs from three models, which are:

model 1: A X-UMX model [1] which is initialized with the weights of the official baseline, and is fine-tuned with a modified Combinational Multi-Domain Loss from [1]. In particular, we implement and apply a differentiable Multichannel Wiener Filter (MWF) [2] before the loss calculation, and compute the frequency-domain L2 loss with raw complex values.
model 2: A U-Net which is similar to Spleeter [3], where all convolution layers are replaced by D3 Blocks from [4], and two layers of 2D local attention are applied at the bottleneck similar to [5].
model 3: A modified version of Demucs [6], where the original decoding module is replaced by four independent decoders, each of which corresponds to one source.

We didn't encounter overfitting in our pilot experiments, so we used the full musdb training set for all the models above, and stopped training upon convergence of the loss function.

The weights of the three outputs are determined empirically:

	Drums	Bass	Other	Vocals
model 1	0.2	0.1	0	0.2
model 2	0.2	0.17	0.5	0.4
model 3	0.6	0.73	0.5	0.4

For the spectrogram-based models (model 1 and 2), we apply MWF to the outputs with one iteration before the fusion.

[1] Sawata, Ryosuke, et al. "All for One and One for All: Improving Music Separation by Bridging Networks." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

[2] Antoine Liutkus, & Fabian-Robert Stöter. (2019). sigsep/norbert: First official Norbert release (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.3269749

[3] Hennequin, Romain, et al. "Spleeter: a fast and efficient music source separation tool with pre-trained models." Journal of Open Source Software 5.50 (2020): 2154.

[4] Takahashi, Naoya, and Yuki Mitsufuji. "D3net: Densely connected multidilated densenet for music source separation." arXiv preprint arXiv:2010.01733 (2020).

[5] Wu, Yu-Te, Berlin Chen, and Li Su. "Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2796-2809.

[6] Défossez, Alexandre, et al. "Music source separation in the waveform domain." arXiv preprint arXiv:1911.13254 (2019).

How to reproduce the training

Install Requirements / Build Virtual Environment

We recommend using conda.

conda env create -f environment.yml
conda activate demixing

Prepare Data

Please download musdb, and edit the "root" parameters in all the json files listed under configs/ to the path where you have the dataset.

Training Model 1

First download the pre-trained model:

wget https://zenodo.org/record/4740378/files/pretrained_xumx_musdb18HQ.pth

Copy the weights for initializing our model:

python xumx_weights_convert.py pretrained_xumx_musdb18HQ.pth xumx_weights.pth

Start training!

python train.py configs/x_umx_mwf.json --weights xumx_weights.pth

Checkpoints will be located under saved/. The config was set to run on a single RTX 3070.

Training Model 2

python train.py configs/unet_attn.json --device_ids 0 1 2 3

Checkpoints will be located under saved/. The config was set to run on four Tesla V100.

Training Model 3

python train.py configs/demucs_split.json

Checkpoints will be located under saved/. The config was set to run on a single RTX 3070, using gradient accumulation and mixed precision training.

Tensorboard Logging

You can monitor the training process using tensorboard:

tesnorboard --logdir runs/

Inference

First make sure you installed danna-sep. Then convert your checkpoints into jit scripts and replace the files under DANNA_CHECKPOINTS:

python jit_convert.py configs/x_umx_mwf.json saved/CrossNet\ Open-Unmix_checkpoint_XXX.pt $DANNA_CHECKPOINTS/xumx_mwf.pth

python jit_convert.py configs/unet_attn.json saved/UNet\ Attention_checkpoint_XXX.pt $DANNA_CHECKPOINTS/unet_attention.pth

python jit_convert.py configs/demucs_split.json saved/DemucsSplit_checkpoint_XXX.pt $DANNA_CHECKPOINTS/demucs_4_decoders.pth

Now you can use danna-sep to separate you favorite music and see how it works!

The training code for the 4th place model at MDX 2021 leaderboard A.

Related tags

Overview

Model Summary

How to reproduce the training

Install Requirements / Build Virtual Environment

Prepare Data

Training Model 1

Training Model 2

Training Model 3

Tensorboard Logging

Inference

Additional Resources

Owner

Chin-Yun Yu

Rootski - Full codebase for rootski.io (without the data)

Shared, streaming Python dict

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

A simple chatbot based on chatterbot that you can use for anything has basic features

Python generation script for BitBirds

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

spaCy plugin for Transformers , Udify, ELmo, etc.

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Official Stanford NLP Python Library for Many Human Languages

Chinese Grammatical Error Diagnosis

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

🏆 • 5050 most frequent words in 109 languages

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

A 10000+ hours dataset for Chinese speech recognition

Codes to pre-train Japanese T5 models

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

Fastseq 基于ONNXRUNTIME的文本生成加速框架