Codebase for ECCV18 "The Sound of Pixels"

Overview

Sound-of-Pixels

Codebase for ECCV18 "The Sound of Pixels".

*This repository is under construction, but the core parts are already there.

Environment

The code is developed under the following configurations.

  • Hardware: 1-4 GPUs (change [--num_gpus NUM_GPUS] accordingly)
  • Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0

Training

  1. Prepare video dataset.

    a. Download MUSIC dataset from: https://github.com/roudimit/MUSIC_dataset

    b. Download videos.

  2. Preprocess videos. You can do it in your own way as long as the index files are similar.

    a. Extract frames at 8fps and waveforms at 11025Hz from videos. We have following directory structure:

    data
    ├── audio
    |   ├── acoustic_guitar
    │   |   ├── M3dekVSwNjY.mp3
    │   |   ├── ...
    │   ├── trumpet
    │   |   ├── STKXyBGSGyE.mp3
    │   |   ├── ...
    │   ├── ...
    |
    └── frames
    |   ├── acoustic_guitar
    │   |   ├── M3dekVSwNjY.mp4
    │   |   |   ├── 000001.jpg
    │   |   |   ├── ...
    │   |   ├── ...
    │   ├── trumpet
    │   |   ├── STKXyBGSGyE.mp4
    │   |   |   ├── 000001.jpg
    │   |   |   ├── ...
    │   |   ├── ...
    │   ├── ...
    

    b. Make training/validation index files by running:

    python scripts/create_index_files.py
    

    It will create index files train.csv/val.csv with the following format:

    ./data/audio/acoustic_guitar/M3dekVSwNjY.mp3,./data/frames/acoustic_guitar/M3dekVSwNjY.mp4,1580
    ./data/audio/trumpet/STKXyBGSGyE.mp3,./data/frames/trumpet/STKXyBGSGyE.mp4,493
    

    For each row, it stores the information: AUDIO_PATH,FRAMES_PATH,NUMBER_FRAMES

  3. Train the default model.

./scripts/train_MUSIC.sh
  1. During training, visualizations are saved in HTML format under ckpt/MODEL_ID/visualization/.

Evaluation

  1. (Optional) Download our trained model weights for evaluation.
./scripts/download_trained_model.sh
  1. Evaluate the trained model performance.
./scripts/eval_MUSIC.sh

Reference

If you use the code or dataset from the project, please cite:

    @InProceedings{Zhao_2018_ECCV,
        author = {Zhao, Hang and Gan, Chuang and Rouditchenko, Andrew and Vondrick, Carl and McDermott, Josh and Torralba, Antonio},
        title = {The Sound of Pixels},
        booktitle = {The European Conference on Computer Vision (ECCV)},
        month = {September},
        year = {2018}
    }
Owner
Hang Zhao
Assistant Professor at Tsinghua University, MIT PhD in Computer Vision
Hang Zhao
TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

TensorFlow-LiveLessons Note that the second edition of this video series is now available here. The second edition contains all of the content from th

Deep Learning Study Group 830 Jan 03, 2023
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Models used for prediction Diabetes and further the basic theory and working of Gold nanoparticles.

GoldNanoparticles This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Mode

1 Jan 30, 2022
Python implementation of the multistate Bennett acceptance ratio (MBAR)

pymbar Python implementation of the multistate Bennett acceptance ratio (MBAR) method for estimating expectations and free energy differences from equ

Chodera lab // Memorial Sloan Kettering Cancer Center 169 Dec 02, 2022
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
Deep Learning Visuals contains 215 unique images divided in 23 categories

Deep Learning Visuals contains 215 unique images divided in 23 categories (some images may appear in more than one category). All the images were originally published in my book "Deep Learning with P

Daniel Voigt Godoy 1.3k Dec 28, 2022
Python script to download the celebA-HQ dataset from google drive

download-celebA-HQ Python script to download and create the celebA-HQ dataset. WARNING from the author. I believe this script is broken since a few mo

133 Dec 21, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

833 Jan 07, 2023
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

28 Dec 02, 2022
Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Deep Learning - All You Need to Know Sponsorship To support maintaining and upgrading this project, please kindly consider Sponsoring the project deve

Instill AI 4.4k Dec 26, 2022
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Summary of related papers on visual attention

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp

MenghaoGuo 2.1k Dec 30, 2022
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing Figure: Joint multi-attribute edits using DyStyle model. Great diversity

74 Dec 03, 2022
Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

pytorch-AdaIN This is an unofficial pytorch implementation of a paper, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Hua

Naoto Inoue 873 Jan 06, 2023
Improving Object Detection by Label Assignment Distillation

Improving Object Detection by Label Assignment Distillation This is the official implementation of the WACV 2022 paper Improving Object Detection by L

Cybercore Co. Ltd 51 Dec 08, 2022
Implementation of FSGNN

FSGNN Implementation of FSGNN. For more details, please refer to our paper Experiments were conducted with following setup: Pytorch: 1.6.0 Python: 3.8

19 Dec 05, 2022
A PyTorch implementation of Implicit Q-Learning

IQL-PyTorch This repository houses a minimal PyTorch implementation of Implicit Q-Learning (IQL), an offline reinforcement learning algorithm, along w

Garrett Thomas 30 Dec 12, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 07, 2022