Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Overview

EAN: Event Adaptive Network

PWC

PyTorch Implementation of paper:

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, and Zhiyong Gao

[ArXiv]

Main Contribution

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.

Content

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

  1. Extract frames from videos:

  2. Generate file lists needed for dataloader:

    • Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:

      video_frame_folder 100 10
      video_2_frame_folder 150 31
      ...
      
    • Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py

  3. Edit dataset config information in datasets_video.py

Pretrained Models

Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model Backbone FLOPs Val Top1 Val Top5 Checkpoints
EAN8F(RGB+LMC) ResNet-50 37G 53.4 81.1 [Jianguo Cloud]
EAN16(RGB+LMC) 74G 54.7 82.3
EAN16+8(RGB+LMC) 111G 57.2 83.9
EAN2 x (16+8)(RGB+LMC) 222G 57.5 84.3

Testing

For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh

# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh

Training

We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:

@misc{tian2021ean,
      title={EAN: Event Adaptive Network for Enhanced Action Recognition}, 
      author={Yuan Tian and Yichao Yan and Xiongkuo Min and Guo Lu and Guangtao Zhai and Guodong Guo and Zhiyong Gao},
      year={2021},
      eprint={2107.10771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]
Owner
TianYuan
TianYuan
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

Ming 2k Jan 08, 2023
Build an Amazon SageMaker Pipeline to Transform Raw Texts to A Knowledge Graph

Build an Amazon SageMaker Pipeline to Transform Raw Texts to A Knowledge Graph This repository provides a pipeline to create a knowledge graph from ra

AWS Samples 3 Jan 01, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022
Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

53 Nov 22, 2022
Implementation of the GBST block from the Charformer paper, in Pytorch

Charformer - Pytorch Implementation of the GBST (gradient-based subword tokenization) module from the Charformer paper, in Pytorch. The paper proposes

Phil Wang 105 Dec 26, 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone In our recent paper we propose the YourTTS model. YourTTS bri

Edresson Casanova 390 Dec 29, 2022
DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021)

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021) This repo is the implementation of DPC. Tested environment Pyth

Dvir Ginzburg 30 Nov 30, 2022
PyTorch deep learning projects made easy.

PyTorch Template Project PyTorch deep learning project made easy. PyTorch Template Project Requirements Features Folder Structure Usage Config file fo

Victor Huang 3.8k Jan 01, 2023
PointPillars inference with TensorRT

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

NVIDIA AI IOT 315 Dec 31, 2022
Akshat Surolia 2 May 11, 2022
Applying curriculum to meta-learning for few shot classification

Curriculum Meta-Learning for Few-shot Classification We propose an adaptation of the curriculum training framework, applicable to state-of-the-art met

Stergiadis Manos 3 Oct 25, 2022
AirLoop: Lifelong Loop Closure Detection

AirLoop This repo contains the source code for paper: Dasong Gao, Chen Wang, Sebastian Scherer. "AirLoop: Lifelong Loop Closure Detection." arXiv prep

Chen Wang 53 Jan 03, 2023
Source for the paper "Universal Activation Function for machine learning"

Universal Activation Function Tensorflow and Pytorch source code for the paper Yuen, Brosnan, Minh Tu Hoang, Xiaodai Dong, and Tao Lu. "Universal acti

4 Dec 03, 2022
Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022 A Multi-Modal Fact Verification dataset released as part of the De-Factify workshop in

Wei-Yao Wang 11 Nov 15, 2022
Framework to build and train RL algorithms

RayLink RayLink is a RL framework used to build and train RL algorithms. RayLink was used to build a RL framework, and tested in a large-scale multi-a

Bytedance Inc. 32 Oct 07, 2022
source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Source code for "Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge" Reference: Abhin Shah, Karthikeyan Shanmugam, Kartik Ahu

Abhin Shah 1 Jun 03, 2022
MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

48 Dec 20, 2022
Code for the paper "Asymptotics of ℓ2 Regularized Network Embeddings"

README Code for the paper Asymptotics of L2 Regularized Network Embeddings. Requirements Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0

Andrew Davison 0 Jan 06, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022