Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Overview

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

This is the official code for DyReg model inroduced in Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Citation

Please use the following BibTeX to cite our work.

@incollection{duta2021dynamic_dyreg_gnn_neurips2021,
title = {Discovering Dynamic Salient Regions with Spatio-Temporal Graph
Neural Networks},
author = {Duta, Iulia and Nicolicioiu, Andrei and Leordeanu, Marius},
booktitle = {Advances in Neural Information Processing Systems 34},
year = {2021}
}

@article{duta2020dynamic_dyreg,
title = {Dynamic Regions Graph Neural Networks for Spatio-Temporal Reasoning},
author = {Duta, Iulia and Nicolicioiu, Andrei and Leordeanu, Marius},
journal = {NeurIPS 2020 Workshop on Object Representations for Learning and Reasoning},
year = {2020},
}

Requirements

The code was developed using:

- python 3.7
- matplotlib
- torch 1.7.1
- script
- pandas
- torchvision
- moviepy
- ffmpeg

Overview:

The repository contains the Pytorch implementation of the DyReg-GNN model. The model is defined and trained in the following files:

  • ops/dyreg.py - code for our DyReg module

  • ops/rstg.py - code for the Spatio-temporal GNN (RSTG) used to process the graph extracted using DyReg

  • create_model.py - two examples how to integrate the DyReg-GNN module inside an existing backbone

  • main_standard.py - code to train a model on Smt-Smt dataset

  • test_models.py - code for multi-clip evaluation

Scripts for preparing the data, training and testing the model:

Prepare dataset

For Something Something dataset:

  • the json files containing meta-data should be stored in ./data/smt-smt-V2/tsm_data
  • the zip files containing the videos should be stored in ./data/smt-smt-V2/

  1. To extract the videos from the zip files run:

cat 20bn-something-something-v2-?? | tar zx

  1. To extract the frames from videos run:

python tools/vid2img_sthv2.py

β†’ The videos will be stored in $FRAME_ROOT (default './data/smt-smt-V2/tmp_smt-smt-V2-frames')

πŸ’‘ If you already have the dataset as frames, place them under ./data/smt-smt-V2/smt-smt-V2-frames/, one folder for each video
πŸ’‘ πŸ’‘ If you need to change the path for datasets modify $ROOT_DATASET in dataset_config.py

  1. To generate the labels file in the required format please run:

python tools/gen_label_sthv2.py

β†’ The resulting txt files, for each split, will be stored in $DATA_UTILS_ROOT (default './data/smt-smt-V2/tsm_data/')

How to run the model

DyReg-GNN module can be simply inserted into any space-time model.

import torch
from torch.nn import functional as F
from ops.dyreg import DynamicGraph, dyregParams

class SpaceTimeModel(torch.nn.Module):
    def __init__(self):
        super(SpaceTimeModel, self).__init__()
        dyreg_params = dyregParams()
        dyregParams.offset_lstm_dim = 32
        self.dyreg = DynamicGraph(dyreg_params,
                    backbone_dim=32, node_dim=32, out_num_ch=32,
                    H=16, W=16, 
                    iH=16, iW=16,
                    project_i3d=False,
                    name='lalalal')


        self.fc = torch.nn.Linear(32, 10)

    def forward(self, x):
        dx = self.dyreg(x)
        # you can initialize the dyreg branch as identity function by normalisation, 
        #   as done in DynamicGraphWrapper found in ./ops/dyreg.py 
        x = x + dx
        # average over time and space: T, H, W
        x = x.mean(-1).mean(-1).mean(-2)
        x = self.fc(x)
        return x


B = 8
T = 10
C = 32
H = 16
W = 16
x = torch.ones(B,T,C,H,W)
st_model = SpaceTimeModel()
out = st_model(x)

For another example of how to integrate DyReg (DynamicGraph module) inside your model please look at create_model.py or run:

python create_model.py

Something-Something experiments

Training a model

To train a model on smt-smt v2 dataset please run

./start_main_standard.sh model_name

For default hyperparameters check opts.py. For example, place_graph flag controls how many DyReg-GNN modules to use and where to place them inside the backbone:

# for a model with 3 DyReg-GNN modules placed after layer 2-block 2, layer 3-block 4 and layer 4-block 1 of the backbone
--place_graph=layer2.2_layer3.4_layer4.1 
# for a model with 1 dyreg module placed after layer 3 block 4 of the backbone
--place_graph=layer3.4                   

Single clip evaluation

Train a model with the above script or download a pre-trained DyReg-GNN model from here and put the checkpoint in ./ckeckpoints/

To evaluate a model on smt-smt v2 dataset on a single 224 x 224 central crop, run:

./start_main_standard_test.sh model_name

The flag $RESUME_CKPT indicate the the checkpoint used for evaluation.

Multi clips evaluation

To evaluate a model in the multi-clips setup (3 spatials clips x 2 temporal samplings) on Smt-Smt v2 dataset please run

./evaluate_model.sh model_name

The flag $RESUME_CKPT indicate the the checkpoint used for evaluation.

TSM Baseline

This repository adds DyReg-GNN modules to a TSM backbone based on code from here.

Owner
Bitdefender Machine Learning
Machine Learning Research @ Bitdefender
Bitdefender Machine Learning
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

0 Feb 14, 2022
Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

ddpg-aigym Deep Deterministic Policy Gradient Implementation of Deep Deterministic Policy Gradiet Algorithm (Lillicrap et al.arXiv:1509.02971.) in Ten

Steven Spielberg P 247 Dec 07, 2022
Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)

Real2CAD-3DV Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ) Group Member: Yue Pan, Yuanwen Yue, Bingxin Ke, Yujie He

24 Jun 22, 2022
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection πŸ€– Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 03, 2022
Official implementation of EfficientPose

EfficientPose This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet

2 May 17, 2022
Lightweight library to build and train neural networks in Theano

Lasagne Lasagne is a lightweight library to build and train neural networks in Theano. Its main features are: Supports feed-forward networks such as C

Lasagne 3.8k Dec 29, 2022
Geometric Algebra package for JAX

JAXGA - JAX Geometric Algebra GitHub | Docs JAXGA is a Geometric Algebra package on top of JAX. It can handle high dimensional algebras by storing onl

Robin Kahlow 36 Dec 22, 2022
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 05, 2023
Efficient Training of Visual Transformers with Small Datasets

Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.

Yahui Liu 112 Dec 25, 2022
This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

JsHou 12 Nov 17, 2022
Keras implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-NorΓ©n 8.9k Jan 04, 2023
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system Getting started To start working on this assignment, you should

2 Aug 06, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

UC Berkeley RISE 178 Jan 05, 2023
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022
Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage

Keepsake Version control for machine learning. Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Goo

Replicate 1.6k Dec 29, 2022
An implementation of Equivariant e2 convolutional kernals into a convolutional self attention network, applied to radio astronomy data.

EquivariantSelfAttention An implementation of Equivariant e2 convolutional kernals into a convolutional self attention network, applied to radio astro

2 Nov 09, 2021
Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

Alex Nichol 2 Aug 22, 2022