Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

This repository is an official pytorch implementation of CSTP work:

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation, AAAI, 2022

Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu

Motivation

Considering the intermediate of contrastive learning, we propose a novel pretext task - spatio-temporal overlap rate prediction
Using playback rate prediciton and rotation prediction to further enhance temporal and spatial representation, respectively.

Requirements

Installation

Python3
pytorch1.1+
PIL
Decord
lmdb
msgpack
ffmpeg

Datasets

Please follow the instructions in DATASET.md for dataset preparation.

Pre-train/finetune/test in CSTP

Pre-train instruction

pre-train on Kinetics

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--nproc_per_node=8 --master_port 11190 main_byol.py --dataset Kin400RepreLMDB --split 1 \
--n_classes 101 --batch_size 128 --sample_duration 16 \
--model_name r21d_byol --model_depth 1 --ft_begin_index 0 \
--lmdb_path "/dockerdata/yujiazhang/dataset/frame_kinetics_400_mmlab_1f_320_lmdb/lmdb_kin400.lmdb" \
--annotation_path "data_process/kin400_mmlab_labels" \
--result_path "results_kin400_r21d_bsor_rotaug_bs128_lr3e2_wd5e4_bvsvtvp0.1v1v1v1v1v1_mlp_proj_epoch300" \
--n_epochs 300 --learning_rate 0.09 --weight_decay 5e-4 \
--sample_size 112 --n_workers 6 --task loss_com --optimizer sgd --loss_weight 0.1 1 1 1 1

pre-train on UCF-101

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch \
--nproc_per_node=6 --master_port 19110 main_byol.py --dataset UcfRepreBYOLSpPre --split 1 \
--n_classes 101 --batch_size 60 --sample_duration 16 \
--model_name r21d_byol --model_depth 1 --ft_begin_index 0 \
--frame_dir "/dockerdata/yujiazhang/dataset/UCF_101_1f_256" \
--annotation_path "data_process/UCF101_labels" \
--result_path "results_ucf_r21d_bsor_bs60_lr3e2_wd5e4_bvsvtvp0.1v1v1v1v1_twospd_mlp_proj" \
--n_epochs 300 --learning_rate 0.03 --weight_decay 5e-4 \
--sample_size 112 --n_workers 6 --task loss_com --optimizer sgd --loss_weight 0.1 1 1 1 1

Finetune instruction

Finetune on UCF-101

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 --master_port 16110 main_ft_mp.py --dataset UcfFineTune --split 1 \
--n_classes 101 --n_finetune_classes 101 --batch_size 60 --sample_duration 16 \
--model_name r21d_byol --model_depth 1 \
--frame_dir "/dockerdata/yujiazhang/dataset/UCF_101_1f_256" \
--annotation_path "data_process/UCF101_labels" \
--result_path "results_ucf_r21d_bsor_bs60_lr3e2_wd5e4_bvsvtvp0.1v1v1v1v1_twospd_mlp_proj" \
--pretrained_path "results_ucf_r21d_bsor_bs60_lr3e2_wd5e4_bvsvtvp0.1v1v1v1v1_twospd_mlp_proj/UcfRepreBYOLSpPre/loss_com/save_300.pth" \
--n_epochs 100 --learning_rate 0.025 --weight_decay 5e-4 \
--sample_size 112 --n_workers 6 --task "ft_all" --optimizer sgd --transform_mode "img" \
--pb_rate 4

Test instruction

CUDA_VISIBLE_DEVICES=0 python test.py --dataset UcfFineTune --split 1 \
--n_classes 101 --n_finetune_classes 101 \
--batch_size 1 --sample_duration 16 \
--model_name r21d_byol --model_depth 1 --ft_begin_index 5 \
--frame_dir "/dockerdata/yujiazhang/dataset/UCF_101_1f_256" \
--annotation_path "data_process/UCF101_labels" \
--result_path "results_ucf_r21d_bsor_bs60_lr3e2_wd5e4_bvsvtvp0.1v1v1v1v1_twospd_mlp_proj" \
--sample_size 112 --n_workers 6 --task "test" --pb_rate 4 --transform_mode "img_test" --t_ft_task "ft_all"

License

The majority of this work is licensed under CC-NC 4.0 International license.

Cite

If you find this repository useful in your own research, please consider citing:

@article{zhang2021contrastive,
  title={Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation},
  author={Zhang, Yujia and Po, Lai-Man and Xu, Xuyuan and Liu, Mengyang and Wang, Yexin and Ou, Weifeng and Zhao, Yuzhi and Yu, Wing-Yin},
  journal={arXiv preprint arXiv:2112.08913},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
data_process		data_process
figures		figures
loss		loss
models		models
scheduler		scheduler
script/r2p1d/kin400		script/r2p1d/kin400
utils_dir		utils_dir
README.md		README.md
main_byol.py		main_byol.py
main_ft_mp.py		main_ft_mp.py
opts.py		opts.py
test.py		test.py
utils.py		utils.py

KT27-A/CSTP

Folders and files

Latest commit

History

Repository files navigation

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Motivation

Requirements

Installation

Datasets

Pre-train/finetune/test in CSTP

Pre-train instruction

Finetune instruction

Test instruction

License

Cite

About

Resources

Stars

Watchers

Forks

Languages