[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Related tags

Deep LearningBE
Overview

TBE

The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning" [arxiv] [code][Project Website]

image

Citation

@inproceedings{wang2021removing,
  title={Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Lin, Yiqi and Ma, Andy J and Cheng, Hao and Peng, Pai and Ji, Rongrong and Sun, Xing},
  booktitle={CVPR},
  year={2021}
}

News

[2020.3.7] The first version of TBE are released!

0. Motivation

  • In camera-fixed situation, the static background in most frames remain similar in pixel-distribution.

  • We ask the model to be temporal sensitive rather than static sensitive.

  • We ask model to filter the additive Background Noise, which means to erasing background in each frame of the video.

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

The impementaion of BE is very simple, you can implement it in two lines by python:

rand_index = random.randint(t)
mixed_x[j] = (1-prob) * x + prob * x[rand_index]

Then, just need define a loss function like MSE:

loss = MSE(F(mixed_x),F(x))

2. Installation

Dataset Prepare

Please refer to [dataset.md] for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)
  • Skvideo.io
  • Matplotlib (gradient_check)

As Kinetics dataset is time-consuming for IO, we decode the avi/mpeg on the fly. Please refer to data/video_dataset.py for details.

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51/Actor-HMDB51
      • hmdb51_sta: the train/val lists of HMDB51_STA
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials, include logs and trained models
    • gradientes:
    • visualization:
    • pretrained_model:
  • src
    • Contrastive
      • data: load data
      • loss: the loss evaluate in this paper
      • model: network architectures
      • scripts: train/eval scripts
      • augmentation: detail implementation of BE augmentation
      • utils
      • feature_extract.py: feature extractor given pretrained model
      • main.py: the main function of pretrain / finetune
      • trainer.py
      • option.py
      • pt.py: BE pretrain
      • ft.py: BE finetune
    • Pretext
      • main.py the main function of pretrain / finetune
      • loss: the loss include classification loss

4. Run

(1). Download dataset lists and pretrained model

A copy of both dataset lists is provided in anonymous. The Kinetics-pretrained models are provided in anonymous.

cd .. && mkdir datasets
mv [path_to_lists] to datasets
mkdir experiments && cd experiments
mkdir pretrained_models && logs
mv [path_to_pretrained_model] to ../experiments/pretrained_model

Download and extract frames of Actor-HMDB51.

wget -c  anonymous
unzip
python utils/data_process/gen_hmdb51_dir.py
python utils/data_process/gen_hmdb51_frames.py

(2). Network Architecture

The network is in the folder src/model/[].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

All the logits_channel are feed into a fc layer with 128-D output.

For simply, we divide the source into Contrastive and Pretext, "--method pt_and_ft" means pretrain and finetune in once.

Action Recognition

Random Initialization

For random initialization baseline. Just comment --weights in line 11 of ft.sh. Like below:

#!/usr/bin/env bash
python main.py \
--method ft --arch i3d \
--ft_train_list ../datasets/lists/diving48/diving48_v2_train_no_front.txt \
--ft_val_list ../datasets/lists/diving48/diving48_v2_test_no_front.txt \
--ft_root /data1/DataSet/Diving48/rgb_frames/ \
--ft_dataset diving48 --ft_mode rgb \
--ft_lr 0.001 --ft_lr_steps 10 20 25 30 35 40 --ft_epochs 45 --ft_batch_size 4 \
--ft_data_length 64 --ft_spatial_size 224 --ft_workers 4 --ft_stride 1 --ft_dropout 0.5 \
--ft_print-freq 100 --ft_fixed 0 # \
# --ft_weights ../experiments/kinetics_contrastive.pth

BE(Contrastive)

Kinetics
bash scripts/kinetics/pt_and_ft.sh
UCF101
bash scripts/ucf101/ucf101.sh
Diving48
bash scripts/Diving48/diving48.sh

For Triplet loss optimization and moco baseline, just modify --pt_method

BE (Triplet)

--pt_method be_triplet

BE(Pretext)

bash scripts/hmdb51/i3d_pt_and_ft_flip_cls.sh

or

bash scripts/hmdb51/c3d_pt_and_ft_flip.sh

Notice: More Training Options and ablation study can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

Kinetics Pretrained (I3D)

Method UCF101 HMDB51 Diving48
Random Initialization 57.9 29.6 17.4
MoCo Baseline 70.4 36.3 47.9
BE 86.5 56.2 62.6

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
BE 10.2 27.6 40.5 56.2 76.6

More Visualization

T-SNE

please refer to utils/visualization/t_SNE_Visualization.py for details.

Confusion_Matrix

please refer to utils/visualization/confusion_matrix.py for details.

Acknowledgement

This work is partly based on UEL and MoCo.

License

The code are released under the CC-BY-NC 4.0 LICENSE.

Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
Simple converter for deploying Stable-Baselines3 model to TFLite and/or Coral

Running SB3 developed agents on TFLite or Coral Introduction I've been using Stable-Baselines3 to train agents against some custom Gyms, some of which

Gary Briggs 16 Oct 11, 2022
Various operations like path tracking, counting, etc by using yolov5

Object-tracing-with-YOLOv5 Various operations like path tracking, counting, etc by using yolov5

Pawan Valluri 5 Nov 28, 2022
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022
A vanilla 3D face modeling on pose-invariant and multi-lightning image data

3D-Face-Modeling A vanilla 3D face modeling on pose-invariant and multi-lightning image data Table of Contents Background Install Usage Contributing B

Haochen Zhang 1 Mar 12, 2022
Network Compression via Central Filter

Network Compression via Central Filter Environments The code has been tested in the following environments: Python 3.8 PyTorch 1.8.1 cuda 10.2 torchsu

2 May 12, 2022
Examples of how to create colorful, annotated equations in Latex using Tikz.

The file "eqn_annotate.tex" is the main latex file. This repository provides four examples of annotated equations: [example_prob.tex] A simple one ins

SyNeRCyS Research Lab 3.2k Jan 05, 2023
202 Jan 06, 2023
🚩🚩🚩

My CTF Challenges 2021 AIS3 Pre-exam / MyFirstCTF Name Category Keywords Difficulty ⒸⓄⓋⒾⒹ-①⑨ (MyFirstCTF Only) Reverse Baby ★ Piano Reverse C#, .NET ★

6 Oct 28, 2021
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

375 Dec 31, 2022
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

wangtianwei 61 Nov 12, 2022
source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

International Business Machines 71 Nov 15, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022
Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Minimalist Error collection Service Features Compatible with any Rollbar client(see https://docs.rollbar.com/docs). Just change the endpoint URL to yo

Haukur Rósinkranz 381 Nov 11, 2022
Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

Tu Anh Dinh 1 Sep 07, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
Transfer Reinforcement Learning for Differing Action Spaces via Q-Network Representations

Transfer-Learning-in-Reinforcement-Learning Transfer Reinforcement Learning for Differing Action Spaces via Q-Network Representations Final Report Tra

Trung Hieu Tran 4 Oct 17, 2022
Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

Jiawei Ren 45 Dec 28, 2022
Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste

Phạm Quốc Huy 1 Dec 18, 2021
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 06, 2023
Predict stock movement with Machine Learning and Deep Learning algorithms

Project Overview Stock market movement prediction using LSTM Deep Neural Networks and machine learning algorithms Software and Library Requirements Th

Naz Delam 46 Sep 13, 2022