Towards Long-Form Video Understanding

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Owner

Chao-Yuan Wu

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

CLOOB training (JAX) and inference (JAX and PyTorch)

An official TensorFlow implementation of “CLCC: Contrastive Learning for Color Constancy” accepted at CVPR 2021.

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

PyTorch implementation of normalizing flow models

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

QICK: Quantum Instrumentation Control Kit

Proto-RL: Reinforcement Learning with Prototypical Representations

A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

DTCN IJCAI - Sequential prediction learning framework and algorithm

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Network Compression via Central Filter

LaneAF: Robust Multi-Lane Detection with Affinity Fields