[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

Last update: Oct 26, 2022

Related tags

Deep Learning SSVC

Overview

SSVC

The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning]

samples of the generated motion-preserved video with threshold $\alpha=0.5$.

Requirements

python3
torch1.1+
PIL
FrEIA==0.2 (Flow-based model)
lintel==1.0 (Decode mp4 videos on the fly)

Structure

backbone
data
- lists: train/val lists (.txt)
- augmentation.py: train/val data augmentation during ssl pre-training
- vDataLoader.py: custom your path to data list
model
- advflow: flow-based model
- classifier.py: linear classifier for down-stream tasks
- infonce.py: combine S$^2$VC with MoCo
flow
- pre-trained flow-based model weights
utils
main_pretrain.py: the main function for self-supervised pretrain
main_eval.py: the main function for supervised fine-tune

Self-supervised Pretrain

DDP

python -m torch.distributed.launch --nproc_per_node=1 --master_port 1234 main_pretrain.py --net r3d18 --img_dim 112 --seq_len 16 --aug_type 1 -t 0.5 -bsz 64 --gpu 0,1 --dataset XX

Single GPU

python main_pretrain.py --net r3d18 --img_dim 112 --seq_len 16 --aug_type 1 -t 0.5 -bsz 64 --gpu 0 --dataset XX

Evaluation

NN-Retrieval

python main_eval.py --retrieval --test SSL_Pt_Model_PTH --dataset XX --gpu X

Finetune

# fine-tune overall model
python main_eval.py --train_what ft --pretrain SSL_Pt_Model_PTH --dataset XX --gpu XX \
--net r3d18 --img_dim 224 --seq_len 32

# freeze backbone, finetune last layer
python main_eval.py --train_what last --pretrain SSL_Pt_Model_PTH --dataset XX --gpu XX \
--net r3d18 --img_dim 224 --seq_len 32

Test

python main_eval.py --train_what XX --ten_crop --test Sup_Ft_Model_PTH --gpu X \
--dataset XX --net r3d18 --img_dim 224 --seq_len 32

[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

Related tags

Overview

SSVC

Requirements

Structure

Self-supervised Pretrain

DDP

Single GPU

Evaluation

NN-Retrieval

Finetune

Test

Owner

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Recognize numbers from an (28 x 28) image using neural networks

On-device speech-to-index engine powered by deep learning.

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

PyTorch-Geometric Implementation of MarkovGNN: Graph Neural Networks on Markov Diffusion

A list of Machine Learning Art Colabs

Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification

EM-POSE 3D Human Pose Estimation from Sparse Electromagnetic Trackers.

Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Medical image analysis framework merging ANTsPy and deep learning

Automatically erase objects in the video, such as logo, text, etc.

Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Distributed Asynchronous Hyperparameter Optimization in Python

A implemetation of the LRCN in mxnet

Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

NLP made easy