Learning Representational Invariances for Data-Efficient Action Recognition

Last update: Nov 22, 2022

Overview

Learning Representational Invariances for Data-Efficient Action Recognition

Official PyTorch implementation for Learning Representational Invariances for Data-Efficient Action Recognition. We follow the code structure of MMAction2.

See the project page for more details.

Installation

We use PyTorch-1.6.0 with CUDA-10.2 and Torchvision-0.7.0.

Please refer to install.md for installation.

Data Preparation

First, please download human detection results and put them in the corresponding folder under data: UCF-101, HMDB-51, Kinetics-100.

Second, please refer to data_preparation.md to prepare raw frames of UCF-101 and HMDB-51. (Instructions of extracting frames from Kinetics-100 will be available soon.)

(Optional) You can download the pre-extracted ImageNet scores: UCF-101, HMDB-51.

Training

We use 8 RTX2080 Ti GPUs to run our experiments. You would need to adjust your training schedule accordingly if you have less GPUs. Please refer to here.

Supervised learning

PORT=${PORT:-29500}

python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=$PORT \
tools/train.py \
$CONFIG \
--launcher pytorch ${@:3} \
--validate

You need to replace $CONFIG with the actual config file:

For supervised baseline, please use config files in configs/recognition/r2plus1d.
For strongly-augmented supervised learning, please use config files in configs/supervised_aug.

Semi-supervised learning

PORT=${PORT:-29500}

python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=$PORT \
tools/train_semi.py \
$CONFIG \
--launcher pytorch ${@:3} \
--validate

You need to replace $CONFIG with the actual config file:

For single dataset semi-supervised learning, please use config files in configs/semi.
For cross-dataset semi-supervised learning, please use config files in configs/semi_both.

Testing

# Multi-GPU testing
./tools/dist_test.sh $CONFIG ${path_to_your_ckpt} ${num_of_gpus} --eval top_k_accuracy

# Single-GPU testing
python tools/test.py $CONFIG ${path_to_your_ckpt} --eval top_k_accuracy

NOTE: Do not use multi-GPU testing if you are currently using multi-GPU training.

Other details

Please see getting_started.md for the basic usage of MMAction2.

Acknowledgement

Codes are built upon MMAction2.

Learning Representational Invariances for Data-Efficient Action Recognition

Related tags

Overview

Learning Representational Invariances for Data-Efficient Action Recognition

Installation

Data Preparation

Training

Supervised learning

Semi-supervised learning

Testing

Other details

Acknowledgement

Owner

Virginia Tech Vision and Learning Lab

Implementation of ConvMixer in TensorFlow and Keras

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Large-Scale Unsupervised Object Discovery

A set of tests for evaluating large-scale algorithms for Wasserstein-2 transport maps computation.

Codes for "Template-free Prompt Tuning for Few-shot NER".

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

Python Algorithm Interview Book Review

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Gesture Volume Control v.2

Tensorflow 2 implementation of our high quality frame interpolation neural network

rliable is an open-source Python library for reliable evaluation, even with a handful of runs, on reinforcement learning and machine learnings benchmarks.

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Pre-training of Graph Augmented Transformers for Medication Recommendation

Segmentation models with pretrained backbones. PyTorch.

SGoLAM - Simultaneous Goal Localization and Mapping