A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Last update: Dec 12, 2022

Overview

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

The source codes for ICCV2021 Paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.
[paper] [supplemental material] [arXiv]

If you find our work or the codebase inspiring and useful to your research, please cite

@inproceedings{yuan2021DIN,
  title={Spatio-Temporal Dynamic Inference Network for Group Activity Recognition},
  author={Yuan, Hangjie and Ni, Dong and Wang, Mang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7476--7485},
  year={2021}
}

Dependencies

Software Environment: Linux (CentOS 7)
Hardware Environment: NVIDIA TITAN RTX
Python 3.6
PyTorch 1.2.0, Torchvision 0.4.0
RoIAlign for Pytorch

Prepare Datasets

Download publicly available datasets from following links: Volleyball dataset and Collective Activity dataset.
Unzip the dataset file into data/volleyball or data/collective.
Download the file tracks_normalized.pkl from cvlab-epfl/social-scene-understanding and put it into data/volleyball/videos

Using Docker

Checkout repository and cd PROJECT_PATH
Build the Docker container

docker build -t din_gar https://github.com/JacobYuan7/DIN_GAR.git#main

Run the Docker container

docker run --shm-size=2G -v data/volleyball:/opt/DIN_GAR/data/volleyball -v result:/opt/DIN_GAR/result --rm -it din_gar

--shm-size=2G: To prevent ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)., you have to extend the container's shared memory size. Alternatively: --ipc=host
-v data/volleyball:/opt/DIN_GAR/data/volleyball: Makes the host's folder data/volleyball available inside the container at /opt/DIN_GAR/data/volleyball
-v result:/opt/DIN_GAR/result: Makes the host's folder result available inside the container at /opt/DIN_GAR/result
-it & --rm: Starts the container with an interactive session (PROJECT_PATH is /opt/DIN_GAR) and removes the container after closing the session.
din_gar the name/tag of the image
optional: --gpus='"device=7"' restrict the GPU devices the container can access.

Get Started

Train the Base Model: Fine-tune the base model for the dataset.

# Volleyball dataset
cd PROJECT_PATH 
python scripts/train_volleyball_stage1.py

# Collective Activity dataset
cd PROJECT_PATH 
python scripts/train_collective_stage1.py

Train with the reasoning module: Append the reasoning modules onto the base model to get a reasoning model.
1. Volleyball dataset
  - DIN
```
python scripts/train_volleyball_stage2_dynamic.py
```
  - lite DIN
    We can run DIN in lite version by setting cfg.lite_dim = 128 in scripts/train_volleyball_stage2_dynamic.py.
```
python scripts/train_volleyball_stage2_dynamic.py
```
  - ST-factorized DIN
    We can run ST-factorized DIN by setting cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.hierarchical_inference = True.
    
    Note that if you set cfg.hierarchical_inference = False, cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.num_DIN = 2, then multiple interaction fields run in parallel.
```
python scripts/train_volleyball_stage2_dynamic.py
```
  Other model re-implemented by us according to their papers or publicly available codes:
  - AT
```
python scripts/train_volleyball_stage2_at.py
```
  - PCTDM
```
python scripts/train_volleyball_stage2_pctdm.py
```
  - SACRF
```
python scripts/train_volleyball_stage2_sacrf_biute.py
```
  - ARG
```
python scripts/train_volleyball_stage2_arg.py
```
  - HiGCIN
```
python scripts/train_volleyball_stage2_higcin.py
```
2. Collective Activity dataset
  - DIN
```
python scripts/train_collective_stage2_dynamic.py
```
  - DIN lite
    We can run DIN in lite version by setting 'cfg.lite_dim = 128' in 'scripts/train_collective_stage2_dynamic.py'.
```
python scripts/train_collective_stage2_dynamic.py
```

Another work done by us, solving GAR from the perspective of incorporating visual context, is also available.

@inproceedings{yuan2021visualcontext,
  title={Learning Visual Context for Group Activity Recognition},
  author={Yuan, Hangjie and Ni, Dong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={4},
  pages={3261--3269},
  year={2021}
}

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Related tags

Overview

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Dependencies

Prepare Datasets

Using Docker

Get Started

Owner

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

Learning Temporal Consistency for Low Light Video Enhancement from Single Images (CVPR2021)

R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

A framework for joint super-resolution and image synthesis, without requiring real training data

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation of Kalman Filter in Python

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution

CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network)

[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Libraries, tools and tasks created and used at DeepMind Robotics.

Zero-Cost Proxies for Lightweight NAS

Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

Leaf: Multiple-Choice Question Generation

R3Det based on mmdet 2.19.0