PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Last update: Nov 03, 2022

Related tags

Overview

Foley Music: Learning to Generate Music from Videos

This repo holds the code for the framework presented on ECCV 2020.

Foley Music: Learning to Generate Music from Videos Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, and Antonio Torralba

paper

Usage Guide

Prerequisites

The training and testing in PGCN is reimplemented in PyTorch for the ease of use.

Pytorch 1.4

Other minor Python modules can be installed by running

pip install -r requirements.txt

Data Preparation

Download Datasets

The extracted pose and midi for training and audio generation can be downloaded here and unzip to ./data folder.

The original datasets (including videos) can be found:

URMP: can be downloaded here
MUSIC: can be downloaded here
AtinPiano: proposed by At Your Fingertips: Automatic Piano Fingering Detection. The dataset can be downloaded here

Training

For URMP

CUDA_VISIBLE_DEVICES=6 python train.py -c config/URMP/violin.conf -e exps/urmp-vn

For AtinPiano

CUDA_VISIBLE_DEVICES=6 python train.py -c config/AtinPiano.conf -e exps/atinpiano

For MUSIC

CUDA_VISIBLE_DEVICES=6 python train.py -c config/MUSIC/accordion.conf -e exps/music-accordion

Generating MIDI, sounds and videos

For URMP

VIDEO_PATH=/path/to/video
INSTRUMENT_NAME='Violin'
python test_URMP.py exps/urmp-vn/checkpoint.pth.tar -o exps/urmp-vn/generate -i Violin -v $VIDEO_PATH -i $INSTRUMENT_NAME

For AtinPiano

VIDEO_PATH=/path/to/video
INSTRUMENT_NAME='Acoustic Grand Piano'
python test_AtinPiano_MUSIC.py exps/atinpiano/checkpoint.pth.tar -o exps/atinpiano/generation -v $VIDEO_PATH -i $INSTRUMENT_NAME

For MUSIC

VIDEO_PATH=/path/to/video
INSTRUMENT_NAME='Accordion'
python test_AtinPiano_MUSIC.py exps/music-accordion/checkpoint.pth.tar -o exps/music-accordion/generation -v $VIDEO_PATH -i $INSTRUMENT_NAME

Notes:

Instrument name ($INSTRUMENT_NAME) can be found here
If you do not have the video file or you want to generate MIDI and audio only, you can add -oa flag to skip the generation of video.

Other Info

Citation

Please cite the following paper if you feel our work useful to your research.

@inproceedings{FoleyMusic2020,
  author    = {Chuang Gan and
               Deng Huang and
               Peihao Chen and
               Joshua B. Tenenbaum and
               Antonio Torralba},
  title     = {Foley Music: Learning to Generate Music from Videos},
  booktitle = {ECCV},
  year      = {2020},
}

PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Related tags

Overview

Foley Music: Learning to Generate Music from Videos

Usage Guide

Prerequisites

Data Preparation

Download Datasets

Training

Generating MIDI, sounds and videos

Other Info

Citation

Owner

Chuang Gan

Wikidated : An Evolving Knowledge Graph Dataset of Wikidata’s Revision History

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

reimpliment of DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

KIDA: Knowledge Inheritance in Data Aggregation

MultiLexNorm 2021 competition system from ÚFAL

Constructing interpretable quadratic accuracy predictors to serve as an objective function for an IQCQP problem that represents NAS under latency constraints and solve it with efficient algorithms.

Walk with fastai

Code for GNMR in ICDE 2021

CellRank's reproducibility repository.

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

The implementation of our CIKM 2021 paper titled as: "Cross-Market Product Recommendation"

Automated detection of anomalous exoplanet transits in light curve data.

Deploy optimized transformer based models on Nvidia Triton server

Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).