This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Last update: Sep 24, 2022

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

This is an official implementation of:

Shizhe Chen and Dong Huang, Elaborative Rehearsal for Zero-shot Action Recognition, ICCV, 2021. Arxiv Version

Elaborating a new concept and relating it to known concepts, we reach the dawn of zero-shot action recognition models being comparable to supervised models trained on few samples.

New SOTA results are also achieved on the standard ZSAR benchmarks (Olympics, HMDB51, UCF101) as well as the first large scale ZSAR benchmak (we proposed) on the Kinetics database.

Installation

git clone https://github.com/DeLightCMU/ElaborativeRehearsal.git
cd ElaborativeRehearsal
export PYTHONPATH=$(pwd):${PYTHONPATH}

pip install -r requirements.txt

# download pretrained models
bash scripts/download_premodels.sh

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

spatial-temporal features

bash scripts/extract_tsm_features.sh '0,1,2'

object features

bash scripts/extract_object_features.sh '0,1,2'

ZSAR Training and Inference

Baselines: DEVISE, ALE, SJE, DEM, ESZSL and GCN.

# mtype: devise, ale, sje, dem, eszsl
mtype=devise
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --eval_set tst
# evaluate other splits
ksplit=1
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines_eval_splits.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} ${ksplit}

# gcn
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --eval_set tst

ER-ZSAR and ablations:

# TSM + ED class representation + AttnPool (2nd row in Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_wordembed_config.yaml --is_train --resume_file datasets/Kinetics/zsl220/word.glove42b.th

# TSM + ED class representation + BERT (last row in Table 4(a) and Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_config.yaml --is_train

# Obj + ED class representation + BERT + ER Loss (last row in Table 4(c))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_cptembed.py zeroshot/configs/zsl_cpt_config.yaml --is_train

# ER-ZSAR Full Model
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_ervse.py zeroshot/configs/zsl_ervse_config.yaml --is_train

Citation

If you find this repository useful, please cite our paper:

@proceeding{ChenHuang2021ER,
  title={Elaborative Rehearsal for Zero-shot Action Recognition},
  author={Shizhe Chen and Dong Huang},
  booktitle = {ICCV},
  year={2021}
}

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

Installation

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

ZSAR Training and Inference

Citation

Acknowledgement

Owner

DeLightCMU

A simple implementation of Kalman filter in Multi Object Tracking

CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

Stacked Generative Adversarial Networks

TensorFlow tutorials and best practices.

Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

PyGCL: Graph Contrastive Learning Library for PyTorch

The all new way to turn your boring vector meshes into the new fad in town; Voxels!

The AWS Certified SysOps Administrator

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans

x-transformers-paddle 2.x version

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

PyTorch experiments with the Zalando fashion-mnist dataset

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A module that used for encrypt code which includes RSA and AES

ConformalLayers: A non-linear sequential neural network with associative layers

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.