Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Last update: Dec 07, 2022

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

Requirements

Python: 3.7.9
Pytorch: 1.6.0
TorchVision: 0.2.1
Cuda: 10.1
Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh 
    
    
     
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh 
      
      
       
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh 
    
     
     
      
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh 
       
        
        
          # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model	dataset	frames	top-1 / top-5	logs	checkpoints
RSANet-R50	SSV1	16	54.0 % / 81.1 %	[log]	[checkpoint]
RSANet-R50	SSV2	16	66.0 % / 89.9 %	[log]	[checkpoint]

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

Low-dose Digital Mammography with Deep Learning

Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Cosine Annealing With Warmup

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

Neural Module Network for VQA in Pytorch

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Official implementation of "Articulation Aware Canonical Surface Mapping"

This program automatically runs Python code copied in clipboard

Rule Based Classification Project For Python

Easily Process a Batch of Cox Models

A Unified Generative Framework for Various NER Subtasks.

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

The devkit of the nuPlan dataset.

Deep generative models of 3D grids for structure-based drug discovery