【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Last update: Dec 27, 2022

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

We release the code of the DSANet (Dynamic Segment Aggregation Network). We introduce the DSA module to capture relationship among snippets for video-level representation learning. Equipped with DSA modules, the top-1 accuracy of I3D ResNet-50 is improved to 78.2% on Kinetics-400.

The core code to implement the Dynamic Segment Aggregation Module is codes/models/modules_maker/DSA.py.

[July 7, 2021] We release the core code of DSANet.

[July 3, 2021] DSANet has been accepted by ACMMM 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models for offline environment

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models for inference

TODO

Data Preparation

We follow the same data process with MVFNet for data preparation.

Model Zoo

TODO

Testing

bash dist_test_recognizer.sh CONFIG_PATH CHECKPOINT_PATH 8

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train DSANet with 8 gpus, you can run:

bash dist_train_recognizer.sh configs/kinetics/r50_e100.py 8

Acknowledgements

We especially thank the contributors of the MVFNet and mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Related Work

MVFNet: Multi-View Fusion Network for Efficient Video Recognition, AAAI2021 Paper | Code

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2021dsanet,
  title={DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning},
  author={Wu, Wenhao and Zhao, Yuxiang and Xu, Yanwu and Tan, Xiao and He, Dongliang and Zou, Zhikang and Ye, Jin and Li, Yingying and Yao, Mingde and Dong, Zichao and others},
  booktitle = {ACMMM},
  year={2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]
Yuxiang Zhao: [email protected]

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Related tags

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Related Work

Citation

Contact

Owner

Wenhao Wu

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

GPT, but made only out of gMLPs

DROPO: Sim-to-Real Transfer with Offline Domain Randomization

Image augmentation library in Python for machine learning.

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Acoustic mosquito detection code with Bayesian Neural Networks

This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Models used for prediction Diabetes and further the basic theory and working of Gold nanoparticles.

Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Variational autoencoder for anime face reconstruction

A foreign language learning aid using a neural network to predict probability of translating foreign words

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Related tags

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Related Work

Citation

Contact

Owner

Wenhao Wu

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

GPT, but made only out of gMLPs

DROPO: Sim-to-Real Transfer with Offline Domain Randomization

Image augmentation library in Python for machine learning.

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Acoustic mosquito detection code with Bayesian Neural Networks

This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Models used for prediction Diabetes and further the basic theory and working of Gold nanoparticles.

Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Variational autoencoder for anime face reconstruction

A foreign language learning aid using a neural network to predict probability of translating foreign words

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务