TVNet: Temporal Voting Network for Action Localization

Last update: Jul 26, 2022

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization".

Paper Introduction

Temporal action localization is a vital task in video understranding. In this paper, we propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.

Dependencies

Python == 2.7
Tensorflow == 1.9.0
CUDA==10.1.105
GCC >= 5.4

Note that the PEM code from BMN is implemented in Pytorch==1.1.0 or 1.3.0

Data Preparation

Datasets

Our experiments is based on ActivityNet 1.3 and THUMOS14 datasets.

Feature for THUMOS14

You can download the feature on THUMOS14 at here GooogleDrive.

Place it into a folder named thumos_features inside ./data.

You also need to download the feature for PEM (from BMN) at GooogleDrive. Please put it into a folder named Thumos_feature_hdf5 inside ./TVNet-THUMOS14/data/thumos_features.

If everything goes well, you can get the folder architecture of ./TVNet-THUMOS14/data like this:

data                       
└── thumos_features                    
		├── Thumos_feature_dim_400              
		├── Thumos_feature_hdf5               
		├── features_train.npy 
		└── features_test.npy

Feature for ActivityNet 1.3

You can download the feature on ActivityNet 1.3 at here GoogleCloud. Please put csv_mean_100 directory into ./TVNet-ANET/data/activitynet_feature_cuhk/.

If everything goes well, you can get the folder architecture of ./TVNet-ANET/data like this:

data                        
└── activitynet_feature_cuhk                    
		    └── csv_mean_100

Run all steps

Run all steps on THUMOS14

cd TVNet-THUMOS14

Run the following script with all steps on THUMOS14:

bash do_all.sh

Note: If you use BlueCrystal 4, you can directly run the following script without any dependencies setup.

bash do_all_BC4.sh

Run all steps on ActivityNet 1.3

cd TVNet-ANET
bash do_all.sh  or  bash do_all_BC4.sh

Run steps separately

Take TVNet-THUMOS14 as an example:

cd TVNet-THUMOS14

1. Temporal evaluation module

python TEM_train.py

python TEM_test.py

2. Creat training data for voting evidence module

python VEM_create_windows.py --window_length L --window_stride S

L is the window length and S is the sliding stride. We generate training windows for length 10 with stride 5, and length 5 with stride 2.

3. Voting evidence module

python VEM_train.py --voting_type TYPE --window_length L --window_stride S

python VEM_test.py --voting_type TYPE --window_length L --window_stride S

TYPE should be start or end. We train and test models with window length 10 (stride 5) and window length 5 (stride 2) for start and end separately.

4. Proposal evaluation module from BMN

python PEM_train.py

5. Proposal generation

python proposal_generation.py

6. Post processing and detection

python post_postprocess.py

Results

THUMOS14

tIoU	[email protected]
0.3	0.5724681814413137
0.4	0.5060844218403346
0.5	0.430414918823808
0.6	0.3297164845828022
0.7	0.202971546242546

ActivityNet 1.3

tIoU	[email protected]
Average	0.3460396513933088
0.5	0.5135151163296395
0.75	0.34955648726767025
0.95	0.10121803584836778

Reference

This implementation borrows from:

BSN: BSN-Boundary-Sensitive-Network

TEM_train/test.py -- for the TEM module we used in our paper
load_dataset.py -- borrow the part which load data for TEM

BMN: BMN-Boundary-Matching-Network

PEM_train.py -- for the PEM module we used in our paper

G-TAD: Sub-Graph Localization for Temporal Action Detection

post_postprocess.py -- for the multicore process to generate detection

Our main contribution is in:

VEM_create_windows.py -- generate training annotations for Voting Evidence Module (VEM)

VEM_train.py -- train Voting Evidence Module (VEM)

VEM_test.py -- test Voting Evidence Module (VEM)

TVNet: Temporal Voting Network for Action Localization

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

Paper Introduction

Dependencies

Data Preparation

Datasets

Feature for THUMOS14

Feature for ActivityNet 1.3

Run all steps

Run all steps on THUMOS14

Run all steps on ActivityNet 1.3

Run steps separately

1. Temporal evaluation module

2. Creat training data for voting evidence module

3. Voting evidence module

4. Proposal evaluation module from BMN

5. Proposal generation

6. Post processing and detection

Results

THUMOS14

ActivityNet 1.3

Reference

Owner

hywang

A python package for generating, analyzing and visualizing building shadows

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Joint learning of images and text via maximization of mutual information

Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Parametric Contrastive Learning (ICCV2021)

pytorch implementation of ABC : Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Time series annotation library.

We present a regularized self-labeling approach to improve the generalization and robustness properties of fine-tuning.

Repository for self-supervised landmark discovery

SoGCN: Second-Order Graph Convolutional Networks

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Fast convergence of detr with spatially modulated co-attention

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Exploit ILP to learn symmetry breaking constraints of ASP programs.