Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

Related tags

Deep LearningRSPNet
Overview

RSPNet

Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning"

[Supplementary Materials]

Getting Started

Install Dependencies

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.6. Other versions should work but are not tested.

Transcode Videos (Optional)

This step is optional but will increase the data loading speed dramatically.

We decode the videos on the fly while training so we don't need to split frames. This makes disk IO a lot faster but increases CPU usage. This transcode step aims at reducing CPU consumed by decoding by 1) lower video resolution. 2) add more key frames.

To perform transcode, you need to have ffmpeg installed, then run:

python utils/transcode_dataset.py PATH/TO/ORIGIN_VIDEOS PATH/TO/TRANSCODED_VIDEOS

Be warned, this will use all your CPU and will take several hours (on our Intel E5-2630 *2 workstation) to complete.

Prepare Datasets

Your are expected to prepare date for pre-training (Kinetics-400 dataset) and fine-tuning (UCF101, HMDB51 and Something-something-v2 datasets). To let the scripts find datasets on your system, the recommended way is to create symbolic links in ./data directory to the actual path. We found this solution flexible.

The expected directory hierarchy is as follow:

├── data
│   ├── hmdb51
│   │   ├── metafile
│   │   │   ├── brush_hair_test_split1.txt
│   │   │   └── ...
│   │   └── videos
│   │       ├── brush_hair
│   │       │   └── *.avi
│   │       └── ...
│   ├── UCF101
│   │   ├── ucfTrainTestlist
│   │   │   ├── classInd.txt
│   │   │   ├── testlist01.txt
│   │   │   ├── trainlist01.txt
│   │   │   └── ...
│   │   └── UCF-101
│   │       ├── ApplyEyeMakeup
│   │       │   └── *.avi
│   │       └── ...
│   ├── kinetics400
│   │   ├── train_video
│   │   │   ├── answering_questions
│   │   │   │   └── *.mp4
│   │   │   └── ...
│   │   └── val_video
│   │       └── (same as train_video)
│   ├── kinetics100
│   │   └── (same as kinetics400)
│   └── smth-smth-v2
│       ├── 20bn-something-something-v2
│       │   └── *.mp4
│       └── annotations
│           ├── something-something-v2-labels.json
│           ├── something-something-v2-test.json
│           ├── something-something-v2-train.json
│           └── something-something-v2-validation.json
└── ...

Alternatively, you can change the path in config/dataset to match your system.

Build Kinetics-100 dataset (Optional)

Some of our ablation study experiments use the Kinetics-100 dataset for pre-training. This dataset is built by extract 100 classes from Kinetics-400, which has the smallest file size on the train set.

If you have Kinetics-400 available, you can build Kinetics-100 by:

python -m utils.build_kinetics_subset

This script will create symbolic links instead of copy data. It is expected to complete in a minute.

We have included a pre-built one at data/kinetics100_links and created the symbolic link data/kinetics100 that related to it. You need to have data/kinetics400 available at runtime.

Pre-training on Pretext Tasks

Now you have set up the environment. Run the following command to pre-train your models on pretext tasks.

export CUDA_VISIBLE_DEVICES=0,1,2,3
# Architecture: C3D
python pretrain.py -e exps/pretext-c3d -c config/pretrain/c3d.jsonnet
# Architecture: ResNet-18
python pretrain.py -e exps/pretext-resnet18 -c config/pretrain/resnet18.jsonnet
# Architecture: S3D-G
python pretrain.py -e exps/pretext-s3dg -c config/pretrain/s3dg.jsonnet
# Architecture: R(2+1)D
python pretrain.py -e exps/pretext-r2plus1d -c config/pretrain/r2plus1d.jsonnet

You can use kinetics100 dataset for training by editing config/pretrain/moco-train-base.jsonnet (line 13)

Action Recognition

After pre-trained on pretext tasks, these models are fine-tuned to perform action recognition task on UCF101, HMDB51 and Something-something-v2 datasets.

export CUDA_VISIBLE_DEVICES=0,1
# Dataset: UCF101
#     Architecture: C3D [email protected]=76.71%
python finetune.py -c config/finetune/ucf101_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/ucf101-c3d
#     Architecture: ResNet-18 [email protected]=74.33%
python finetune.py -c config/finetune/ucf101_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/ucf101-resnet18
#     Architecture: S3D-G [email protected]=89.9%
python finetune.py -c config/finetune/ucf101_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/ucf101-s3dg
#     Architecture: R(2+1)D [email protected]=81.1%
python finetune.py -c config/finetune/ucf101_r2plus1d.jsonnet \
                   --mc exps/pretext-r2plus1d/model_best.pth.tar \
                   -e exps/ucf101-r2plus1d

# Dataset: HMDB51
#     Architecture: C3D [email protected]=44.58%
python finetune.py -c config/finetune/hmdb51_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/hmdb51-c3d
#     Architecture: ResNet-18 [email protected]=41.83%
python finetune.py -c config/finetune/hmdb51_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/hmdb51-resnet18
#     Architecture: S3D-G [email protected]=59.6%
python finetune.py -c config/finetune/hmdb51_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/hmdb51-s3dg
#     Architecture: R(2+1)D [email protected]=44.6%
python finetune.py -c config/finetune/hmdb51_r2plus1d.jsonnet \
                   --mc exps/pretext-r2plus1d/model_best.pth.tar \
                   -e exps/hmdb51-r2plus1d

# Dataset: Something-something-v2
#     Architecture: C3D [email protected]=47.76%
python finetune.py -c config/finetune/smth_smth_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/smthv2-c3d
#     Architecture: ResNet-18 [email protected]=44.02%
python finetune.py -c config/finetune/smth_smth_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/smthv2-resnet18
#     Architecture: S3D-G [email protected]=55.03%
python finetune.py -c config/finetune/smth_smth_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/smthv2-s3dg

Results and Pre-trained Models

Architecture Pre-trained dataset Pre-training epoch Pre-trained model Acc. on UCF101 Acc. on HMDB51
S3D-G Kinetics-400 1000 Download link 93.7 64.7
S3D-G Kinetics-400 200 Download link 89.9 59.6
R(2+1)D Kinetics-400 200 Download link 81.1 44.6
ResNet-18 Kinetics-400 200 Download link 74.3 41.8
C3D Kinetics-400 200 Download link 76.7 44.6

Video Retrieval

The pretrained model can also be used in searching relevant videos based on the given query video.

export CUDA_VISIBLE_DEVICES=0 # use single GPU 
python retrieval.py -c config/retrieval/ucf101_resnet18.jsonnet \
                    --mc exps/pretext-resnet18/model_best.pth.tar \
                    -e exps/retrieval-resnet18    

The video retrieval result in our paper

Architecture k=1 k=5 k=10 k=20 k=50
C3D 36.0 56.7 66.5 76.3 87.7
ResNet-18 41.1 59.4 68.4 77.8 88.7

Visualization

We further visualize the region of interest (RoI) that contributes most to the similarity score using the class activation map (CAM) technique.

export CUDA_VISIBLE_DEVICES=0,1
python visualization.py -c config/pretrain/s3dg.jsonnet \
                        --load-model exps/pretext-s3dg/model_best.pth.tar \
                        -e exps/visual-s3dg \
                        -x '{batch_size: 1}'

The cam visualization results will be plotted in png files like

Troubleshoot

  • DECORDError cannot find video stream with wanted index: -1

    Some video from Kinetics dataset does not contain a valid video stream for some unknown reason. To filter them out, run python utils/verify_video.py PATH/TO/VIDEOS, then copy the output to the blacklist config in config/dataset/kinetics{400,100}.libsonnet. You need to have ffmpeg installed.

Citation

Please cite the following paper if you feel RSPNet useful to your research

@InProceedings{chen2020RSPNet,
author = {Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, and Chuang Gan},
title = {RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning},
booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
year = {2021}
}

Contact

For any question, please file an issue or contact

Peihao Chen: [email protected]
Deng Huang: [email protected]
Comments
  • r(2+1) d -18 pretrained model not fully reproducible

    r(2+1) d -18 pretrained model not fully reproducible

    Hi, I finetuned the given pre-trained r(2+1)d model on ucf-101 using the given finetuning code. It only achieves (76 -77%) accuracy. Can you confirm if the given model is the correct one. I use the same setup as mentioned in the readme.

    opened by fmthoker 3
  • framework image

    framework image

    hello, thank you for your great work. it's so smart idea!

    can you explain about framework image? i understand about RSP task, A-VID task is learned in 1 iteration. i think that it means 'anchor is same'. and i saw the algorithm, just sampling K clips in video V\v+, however, in paper fig 2. two clips in video, 1x clip and 2x clip 's features(green color) are going to g_a header and do contrastive learning. i think about you want to show us randomly selected speed.... is right? in real experiment, just c_i, c_j, {c_n}(K) clips in there? not 2K?

    thank you

    opened by youwantsy 2
  • The pre-training model of s3d-g model based on Imagenet and dynamics-400 data set?

    The pre-training model of s3d-g model based on Imagenet and dynamics-400 data set?

    Where can I download the pre training model of s3d-g model based on Imagenet and dynamics-400 dataset? Or can you upload it to this repository? 请问哪里可以下载到基于ImageNet和Kinetics-400数据集的S3D-G模型的预训练模型?或者请问作者可以上传一下公开吗?

    opened by LiangSiyv 2
  • Question about computational resources

    Question about computational resources

    Hi, Thanks for your wonderful paper and code. I want to know the computational resources of your experiments. 1. What and how many GPUs you use? 2. The training time of pretraining on K400 for 200 epochs. 3. The training time of finetuning on UCF101, HMDB51, Something-V2, respectively. Looking forward to your reply. Thanks.

    opened by wjn922 2
  • 'No configuration setting found for key force_n_crop'

    'No configuration setting found for key force_n_crop'

    I downloaded your S3D-G pre-trained model for my action recognition task on UCF101 but I keep getting this error:

    argument type: <class 'str'> Setting ulimit -n 8192 world_size=1 Using dist_url=tcp://127.0.0.1:36879 Local Rank: 0 2021-12-30 07:31:39,148|INFO |Args = Args(parser=None, config='config/finetune/ucf101_s3dg.jsonnet', ext_config=[], debug=False, experiment_dir=PosixPath('exps/ucf101-s3dg'), _run_dir=PosixPath('exps/ucf101-s3dg/run_2_20211230_073138'), load_checkpoint=None, load_model=None, validate=False, moco_checkpoint='exps/pretext-s3dg/model_best_s3dg_200epoch.pth.tar', seed=None, world_size=1, _continue=False, no_scale_lr=False) 2021-12-30 07:31:39,149|INFO |cudnn.benchmark = True 2021-12-30 07:31:39,278|INFO |Config = batch_size = 4 dataset { annotation_path = "data/UCF101/ucfTrainTestlist" fold = 1 mean = [ 0.485 0.456 0.406 ] name = "ucf101" num_classes = 101 root = "data/UCF101/UCF-101" std = [ 0.229 0.224 0.225 ] } final_validate { batch_size = 4 } log_interval = 10 method = "from-scratch" model { arch = "s3dg" } model_type = "multitask" num_epochs = 50 num_workers = 8 optimizer { dampening = 0 lr = 0.005 milestones = [ 50 100 150 ] momentum = 0.9 nesterov = false patience = 10 schedule = "cosine" weight_decay = 0.0001 } spatial_transforms { color_jitter { brightness = 0 contrast = 0 hue = 0 saturation = 0 } crop_area { max = 1 min = 0.25 } gray_scale = 0 size = 224 } temporal_transforms { frame_rate = 25 size = 64 strides = [ { stride = 1 weight = 1 } ] validate { final_n_crop = 10 n_crop = 1 stride = 1 } } validate { batch_size = 4 } 2021-12-30 07:31:39,282|INFO |Using global get_model_class({'arch': 's3dg'}) 2021-12-30 07:31:39,283|INFO |Using MultiTask Wrapper 2021-12-30 07:31:39,283|WARNING |<class 'moco.split_wrapper.MultiTaskWrapper'> using groups: 1 2021-12-30 07:31:39,383|INFO |Found fc: fc with in_features: 1024 2021-12-30 07:31:42,488|INFO |Building Dataset: VID: False, Split=train 2021-12-30 07:31:42,488|INFO |Temporal transform type: clip Traceback (most recent call last): File "finetune.py", line 502, in main() File "finetune.py", line 498, in main mp.spawn(main_worker, args=(args, dist_url,), nprocs=args.world_size) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/ubuntu/RSPNet/finetune.py", line 452, in main_worker engine = Engine(args, cfg, local_rank=local_rank) File "/home/ubuntu/RSPNet/finetune.py", line 171, in init self.train_loader = self.data_loader_factory.build( File "/home/ubuntu/RSPNet/datasets/classification/init.py", line 81, in build temporal_transform = self.get_temporal_transform(split) File "/home/ubuntu/RSPNet/datasets/classification/init.py", line 276, in get_temporal_transform if tt_cfg.get_bool("force_n_crop"): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 310, in get_bool string_value = self.get_string(key, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 221, in get_string value = self.get(key, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 209, in get return self._get(ConfigTree.parse_key(key), 0, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 151, in _get raise ConfigMissingException(u"No configuration setting found for key {key}".format(key='.'.join(key_path[:key_index + 1]))) pyhocon.exceptions.ConfigMissingException: 'No configuration setting found for key force_n_crop'

    opened by aloma85 0
Releases(pretrained_model)
Mengzi Pretrained Models

中文 | English Mengzi 尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。 我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

Langboat 424 Jan 04, 2023
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Advanced Image Manipulation Lab @ Samsung AI Center Moscow 4.7k Dec 31, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch = 0.2.0 torchvision = 0.1.8 fcn = 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 07, 2023
My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

machine-learning-with-graphs My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs Course materials can be

Marko Njegomir 7 Dec 14, 2022
🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

🌎 JSONClasses JSONClasses is a declarative data flow pipeline and data graph framework. Official Website: https://www.jsonclasses.com Official Docume

Fillmula Inc. 53 Dec 09, 2022
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
Implementation of Convolutional LSTM in PyTorch.

ConvLSTM_pytorch This file contains the implementation of Convolutional LSTM in PyTorch made by me and DavideA. We started from this implementation an

Andrea Palazzi 1.3k Dec 29, 2022
2.86% and 15.85% on CIFAR-10 and CIFAR-100

Shake-Shake regularization This repository contains the code for the paper Shake-Shake regularization. This arxiv paper is an extension of Shake-Shake

Xavier Gastaldi 294 Nov 22, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 07, 2022
Differential rendering based motion capture blender project.

TraceArmature Summary TraceArmature is currently a set of python scripts that allow for high fidelity motion capture through the use of AI pose estima

William Rodriguez 4 May 27, 2022
EssentialMC2 Video Understanding

EssentialMC2 Introduction EssentialMC2 is a complete system to solve video understanding tasks including MHRL(representation learning), MECR2( relatio

Alibaba 106 Dec 11, 2022
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP Andreas Fürst* 1, Elisabeth Rumetshofer* 1, Viet Tran1, Hubert Ramsauer1, Fei Tang3, Joh

Institute for Machine Learning, Johannes Kepler University Linz 133 Jan 04, 2023
Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022
People Interaction Graph

Gihan Jayatilaka*, Jameel Hassan*, Suren Sritharan*, Janith Senananayaka, Harshana Weligampola, et. al., 2021. Holistic Interpretation of Public Scenes Using Computer Vision and Temporal Graphs to Id

University of Peradeniya : COVID Research Group 1 Aug 24, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 7.1k Jan 01, 2023
Node Dependent Local Smoothing for Scalable Graph Learning

Node Dependent Local Smoothing for Scalable Graph Learning Requirements Environments: Xeon Gold 5120 (CPU), 384GB(RAM), TITAN RTX (GPU), Ubuntu 16.04

Wentao Zhang 15 Nov 28, 2022
Creating Multi Task Models With Keras

Creating Multi Task Models With Keras About The Project! I used the keras and Tensorflow Library, To build a Deep Learning Neural Network to Creating

Srajan Chourasia 4 Nov 28, 2022