AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Last update: Dec 26, 2022

Related tags

Deep Learning AdaFocusV2

Overview

AdaFocusV2

This repo contains the official code and pre-trained models for AdaFocusV2.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Introduction

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between accuracy and inference speed by dynamically identifying and attending to the informative regions in each video frame. However, AdaFocus requires a complicated three-stage training pipeline (involving reinforcement learning), leading to slow convergence and is unfriendly to practitioners. This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. We further present an improved training scheme to address the issues introduced by the one-stage formulation, including the lack of supervision, input diversity and training stability. Moreover, a conditional-exit technique is proposed to perform temporal adaptive computation on top of AdaFocus without additional training. Extensive experiments on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, and Jester) demonstrate that our model significantly outperforms the original AdaFocus and other competitive baselines, while being considerably more simple and efficient to train.

Results

Compared with AdaFocusV1

ActivityNet, FCVID and Mini-Kinetics

Something-Something V1&V2 and Jester

Visualization

Get Started

Please go to the folder Experiments on ActivityNet, FCVID and Mini-Kinetics and Experiments on Sth-Sth and Jester for specific docs.

Contact

If you have any question, feel free to contact the authors or raise an issue. Yulin Wang: [email protected].

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Related tags

Overview

AdaFocusV2

Introduction

Results

Get Started

Contact

Owner

Zsseg.baseline - Zero-Shot Semantic Segmentation

Implementation for Curriculum DeepSDF

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

SBINN: Systems-biology informed neural network

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021

An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance"

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Python Jupyter kernel using Poetry for reproducible notebooks

Look Who’s Talking: Active Speaker Detection in the Wild

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Código de um painel de auto atendimento feito em Python.

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"