Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

DCGAN LSGAN WGAN-GP DRAGAN PyTorch

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

Where2Act: From Pixels to Actions for Articulated 3D Objects

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

pytorch, hand(object) detect ,yolo v5，手检测

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Sequence Modeling with Structured State Spaces

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Facestar dataset. High quality audio-visual recordings of human conversational speech.

Deep learning with dynamic computation graphs in TensorFlow

Invariant Causal Prediction for Block MDPs

A pytorch-based real-time segmentation model for autonomous driving

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

Sequence-to-Sequence learning using PyTorch

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

Making a music video with Wav2CLIP and VQGAN-CLIP

[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Experiments with Fourier layers on simulation data.

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices