Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

Compressed Video Action Recognition

Related tags

Overview

Compressed Video Action Recognition

Overview

Results

Data loader

Using CoViAR

Citation

Acknowledgment

Owner

Chao-Yuan Wu

Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

Labels4Free: Unsupervised Segmentation using StyleGAN

Official implementation of EfficientPose

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

A programming language written with python

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

Studying Python release adoptions by looking at PyPI downloads

CoaT: Co-Scale Conv-Attentional Image Transformers

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

For IBM Quantum Challenge 2021 (May 20 - 26)

Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Learning Super-Features for Image Retrieval

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Create and implement a deep learning library from scratch.