Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

Compressed Video Action Recognition

Related tags

Overview

Compressed Video Action Recognition

Overview

Results

Data loader

Using CoViAR

Citation

Acknowledgment

Owner

Chao-Yuan Wu

A simple Neural Network that predicts the label for a series of handwritten digits

Problem-943.-ACMP - Problem 943. ACMP

Generative Adversarial Networks for High Energy Physics extended to a multi-layer calorimeter simulation

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

This repository will be a summary and outlook on all our open, medical, AI advancements.

Competitive Programming Club, Clinify's Official repository for CP problems hosting by club members.

Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

StarGAN-ZSVC: Unofficial PyTorch Implementation

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Rotation-Only Bundle Adjustment

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Plugin adapted from Ultralytics to bring YOLOv5 into Napari

harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017