Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

  • src/: code of the whole pipeline

    • train.py: training script, take a npz as input music data to train the model

    • model.py: code of the model

    • gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs

    • src/video2npz/: convert video into npz by extracting motion saliency and motion speed

  • dataset/: processed dataset for training, in the format of npz

  • logs/: logs that automatically generate during training, can be used to track training process

  • exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)

  • inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

  • clone this repo

  • download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/

  • download pretrained model loss_8_params.pt from HERE and put it under exp/

  • install ffmpeg=3.2.4

  • prepare a Python3 conda environment

    pip install -r py3_requirements.txt
  • prepare a Python2 conda environment (for extracting visbeat)

    • pip install -r py2_requirements.txt
    • open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

  • If you want to use another training set: convert training data from midi into npz under dataset/

    python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
  • train the model

    python train.py -n XXX -g 0 1 2 3
    
    # -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
    # -l (--lr): initial learning rate
    # -b (--batch_size): batch size
    # -p (--path): if used, load model checkpoint from the given path
    # -e (--epochs): number of epochs in training
    # -t (--train_data): path of the training data (.npz file) 
    # -g (--gpus): ids of gpu
    # other model hyperparameters: modify the source .py files

Inference

  • convert input video (MP4 format) into npz (use the Python2 environment)

    cd src/video2npz
    sh video2npz.sh ../../videos/xxx.mp4
    • try resizing the video if this takes a long time
  • run model to generate .mid :

    python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"
    
    # -c: checkpoints to be loaded
    # -f: input npz file
    # -g: id of gpu (only one gpu is needed for inference) 
    • if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py
  • convert midi into audio: use GarageBand (recommended) or midi2audio

    • set tempo to the value of tempo in video2npz/metadata.json
  • combine original video and audio into video with BGM

    ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'
    
    # xxx.mp4: input video
    # yyy.mp3: audio file generated in the previous step
    # zzz.mp4: output video
Owner
Zhaokai Wang
Undergraduate student from Beihang University
Zhaokai Wang
Music library streaming app written in Flask & VueJS

djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I

Ryan Tasson 6 May 27, 2022
A Unified Framework and Analysis for Structured Knowledge Grounding

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu

HKU NLP Group 370 Dec 21, 2022
Global Filter Networks for Image Classification

Global Filter Networks for Image Classification Created by Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou This repository contains PyTorch

Yongming Rao 273 Dec 26, 2022
Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

TXT 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Txt2Xml too

Nguyễn Trường Lâu 4 Nov 24, 2022
Semantic Segmentation with Pytorch-Lightning

This is a simple demo for performing semantic segmentation on the Kitti dataset using Pytorch-Lightning and optimizing the neural network by monitoring and comparing runs with Weights & Biases.

Boris Dayma 58 Nov 18, 2022
Generative Models as a Data Source for Multiview Representation Learning

GenRep Project Page | Paper Generative Models as a Data Source for Multiview Representation Learning Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip

Ali 81 Dec 03, 2022
This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters.

openmc-plasma-source This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters. The OpenMC sources a

Fusion Energy 10 Oct 18, 2022
Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

DALLE2 Video (wip) ** only to be built after DALLE2 image is done and replicated, and the importance of the prior network is validated ** Direct appli

Phil Wang 105 May 15, 2022
This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Official Pytorch Implementation for GLFC [CVPR-2022] Federated Class-Incremental Learning This is the official implementation code of our paper "Feder

Race Wang 57 Dec 27, 2022
Machine Learning with JAX Tutorials

The purpose of this repo is to make it easy to get started with JAX. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I fou

Aleksa Gordić 372 Dec 28, 2022
Export CenterPoint PonintPillars ONNX Model For TensorRT

CenterPoint-PonintPillars Pytroch model convert to ONNX and TensorRT Welcome to CenterPoint! This project is fork from tianweiy/CenterPoint. I impleme

CarkusL 149 Dec 13, 2022
AI-generated-characters for Learning and Wellbeing

AI-generated-characters for Learning and Wellbeing Click here for the full project page. This repository contains the source code for the paper AI-gen

MIT Media Lab 214 Jan 01, 2023
Simultaneous Demand Prediction and Planning

Simultaneous Demand Prediction and Planning Dependencies Python packages: Pytorch, scikit-learn, Pandas, Numpy, PyYAML Data POI: data/poi Road network

Yizong Wang 1 Sep 01, 2022
Controlling the MicriSpotAI robot from scratch

Project-MicroSpot-AI Controlling the MicriSpotAI robot from scratch Colaborators Alexander Dennis Components from MicroSpot The MicriSpotAI has the fo

Dennis Núñez-Fernández 5 Oct 20, 2022
[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

ArbSR Pytorch implementation of "Learning A Single Network for Scale-Arbitrary Super-Resolution", ICCV 2021 [Project] [arXiv] Highlights A plug-in mod

Longguang Wang 229 Dec 30, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 36 Oct 31, 2022
Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Daniel Povey 41 Jan 07, 2023
OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Documentation: https://mmsegmentation.readthedocs.io/ English | 简体中文 Introduction MMSegmentation is an open source semantic segmentation toolbox based

OpenMMLab 5k Dec 31, 2022
Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Gradient Centralization TensorFlow This Python package implements Gradient Centralization in TensorFlow, a simple and effective optimization technique

Rishit Dagli 101 Nov 01, 2022
SimplEx - Explaining Latent Representations with a Corpus of Examples

SimplEx - Explaining Latent Representations with a Corpus of Examples Code Author: Jonathan Crabbé ( Jonathan Crabbé 14 Dec 15, 2022