[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Overview

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials
    • gradientes: grad check
    • visualization:
  • src
    • data: load data
    • loss: the loss evaluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • augment: detail implementation of Spatio-temporal Augmentation
    • utils
    • feature_extract.py: feature extractor given pretrained model
    • main.py: the main function of finetune
    • trainer.py
    • option.py
    • pt.py: self-supervised pretrain
    • ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics
bash scripts/kinetics/pt.sh
UCF101
bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51
bash scripts/hmdb51/ft.sh
UCF101
bash scripts/ucf101/ft.sh
Kinetics
bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method UCF101 HMDB51
Random Initialization 47.9 29.6
MoCo Baseline 62.3 36.5
DSM(Triplet) 70.7 48.5
DSM 74.8 52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method @1 @5 @10 @20 @50
DSM 16.8 33.4 43.4 54.6 70.7

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
DSM 8.2 25.9 38.1 52.0 75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2020enhancing,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  booktitle = {AAAI},
  year      = {2021},
}
Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
TransNet V2: Shot Boundary Detection Neural Network

TransNet V2: Shot Boundary Detection Neural Network This repository contains code for TransNet V2: An effective deep network architecture for fast sho

Tomáš Souček 212 Dec 27, 2022
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 70 Dec 04, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

MLP-Numpy A simple modular implementation of Multi Layer Perceptron in pure Numpy. I used the Iris dataset from scikit-learn library for the experimen

Soroush Omranpour 1 Jan 01, 2022
Contains a bunch of different python programm tasks

py_tasks Contains a bunch of different python programm tasks Armstrong.py - calculate Armsrong numbers in range from 0 to n with / without cache and c

Dmitry Chmerenko 1 Dec 17, 2021
OoD Minimum Anomaly Score GAN - Code for the Paper 'OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary'

OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary Out-of-Distribution Minimum Anomaly Score GAN (OMASGAN) C

- 8 Sep 27, 2022
Python Implementation of the CoronaWarnApp (CWA) Event Registration

Python implementation of the Corona-Warn-App (CWA) Event Registration This is an implementation of the Protocol used to generate event and location QR

MaZderMind 17 Oct 05, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

9 Dec 13, 2022
A Tensorfflow implementation of Attend, Infer, Repeat

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models This is an unofficial Tensorflow implementation of Attend, Infear, Repeat (AIR)

Adam Kosiorek 82 May 27, 2022
Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Semantic Segmentation Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch Features Applicable to followin

sithu3 530 Jan 05, 2023
CAUSE: Causality from AttribUtions on Sequence of Events

CAUSE: Causality from AttribUtions on Sequence of Events

Wei Zhang 21 Dec 01, 2022
You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

Huiyiqianli 42 Dec 06, 2022
Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Robot-Arm-using-PS4-Controller You can see all details about this Robot

MohammadReza Sharifi 5 Jan 01, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets' Paper Original paper can be found here Data

Tom Lieberum 38 Aug 09, 2022
[NeurIPS '21] Adversarial Attacks on Graph Classification via Bayesian Optimisation (GRABNEL)

Adversarial Attacks on Graph Classification via Bayesian Optimisation @ NeurIPS 2021 This repository contains the official implementation of GRABNEL,

Xingchen Wan 12 Dec 23, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022
Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [NeurIPS 2021] Official code to reproduce the results and data p

Yash Sharma 27 Sep 19, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 04, 2023