【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Last update: Dec 27, 2022

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

We release the code of the DSANet (Dynamic Segment Aggregation Network). We introduce the DSA module to capture relationship among snippets for video-level representation learning. Equipped with DSA modules, the top-1 accuracy of I3D ResNet-50 is improved to 78.2% on Kinetics-400.

The core code to implement the Dynamic Segment Aggregation Module is codes/models/modules_maker/DSA.py.

[July 7, 2021] We release the core code of DSANet.

[July 3, 2021] DSANet has been accepted by ACMMM 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models for offline environment

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models for inference

TODO

Data Preparation

We follow the same data process with MVFNet for data preparation.

Model Zoo

TODO

Testing

bash dist_test_recognizer.sh CONFIG_PATH CHECKPOINT_PATH 8

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train DSANet with 8 gpus, you can run:

bash dist_train_recognizer.sh configs/kinetics/r50_e100.py 8

Acknowledgements

We especially thank the contributors of the MVFNet and mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Related Work

MVFNet: Multi-View Fusion Network for Efficient Video Recognition, AAAI2021 Paper | Code

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2021dsanet,
  title={DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning},
  author={Wu, Wenhao and Zhao, Yuxiang and Xu, Yanwu and Tan, Xiao and He, Dongliang and Zou, Zhikang and Ye, Jin and Li, Yingying and Yao, Mingde and Dong, Zichao and others},
  booktitle = {ACMMM},
  year={2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]
Yuxiang Zhao: [email protected]

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Related tags

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Related Work

Citation

Contact

Owner

Wenhao Wu

[ECCV 2020] Gradient-Induced Co-Saliency Detection

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Solution to the Weather4cast 2021 challenge

Simple implementation of Mobile-Former on Pytorch

Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

A set of tools for converting a darknet dataset to COCO format working with YOLOX

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Local-Global Stratified Transformer for Efficient Video Recognition

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

iris - Open Source Photos Platform Powered by PyTorch