Implementation of TimeSformer, a pure attention-based solution for video classification

Last update: Jan 03, 2023

Overview

TimeSformer - Pytorch

Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification. This repository will only house the best performing variant, 'Divided Space-Time Attention', which is nothing more than attention along the time axis before the spatial.

Install

$ pip install timesformer-pytorch

Usage

import torch
from timesformer_pytorch import TimeSformer

model = TimeSformer(
    dim = 512,
    image_size = 224,
    patch_size = 16,
    num_frames = 8,
    num_classes = 10,
    depth = 12,
    heads = 8,
    dim_head =  64,
    attn_dropout = 0.1,
    ff_dropout = 0.1
)

video = torch.randn(2, 8, 3, 224, 224) # (batch x frames x channels x height x width)
pred = model(video) # (2, 10)

Citations

@misc{bertasius2021spacetime,
    title   = {Is Space-Time Attention All You Need for Video Understanding?}, 
    author  = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
    year    = {2021},
    eprint  = {2102.05095},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

How to deal with varying length video? Thanks

Dear all, I am wondering if TimeSformer can handle different videos with diverse lengths? Is it possible to use mask as the original Transformer? Any ideas, thanks a lot.

opened by junyongyou 2
fix runtime error in SpaceTime Attention

There is a shape mismatch error in Attention. When we splice out the classification token from the first token of each sequence in q, k and v, the shape becomes (batch_size * num_heads, num_frames * num_patches - 1, head_dim). Then we try to reshape the tensor by taking out a factor of num_frames or num_patches (depending on whether it is space or time attention) from dimension 1. That doesn't work because we subtracted out the classification token.

I found that performing the rearrange operation before splicing the token fixes the issue.

I recreate the problem and illustrate the solution in this notebook: https://colab.research.google.com/drive/1lHFcn_vgSDJNSqxHy7rtqhMVxe0nUCMS?usp=sharing.

By the way, thank you to @lucidrains; all of your implementations on attention-based models are helping me more than you know.

opened by adam-mehdi 1
Update timesformer_pytorch.py

fixing issue for scaling

File "/home/aarti9/.local/lib/python3.6/site-packages/timesformer_pytorch/timesformer_pytorch.py", line 82, in forward q *= self.scale

RuntimeError: Output 0 of ViewBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views is forbidden. You should replace the inplace operation by an out-of-place one.

opened by aarti9 0
Fine-tune with new datasets

Thank you so much for your great effort. I can predict the images using the given .py files. But, I couldn't find train.py files, so how to fine-tune the network with new datasets? where should i define the image samples of the new dataset ?

opened by Jeba-create 0
problem in timesformer_pytorch.py

start from line 182 video = rearrange(video, 'b f c (h p1) (w p2) -> b (f h w) (p1 p2 c)', p1 = p, p2 = p) i think this should be video = rearrange(video, 'b f c (hp p1) (wp p2) -> b (f hp wp) (p1 p2 c)', p1 = p, p2 = p)

opened by Weizhongjin 2
Imagenet Pretrained Weights

Thanks for the work! In their paper they say For all our experiments, we adopt the “Base” ViT model architecture (Dosovitskiy et al., 2020) pretrained on ImageNet.

I know that you said the official weights trained on kinetics and such are not officially released yet. However, I am not interested in those but am actually in need of the initial weights of the network just based on ViT Imagenet pretraining. I need to train this implementation of yours starting from those. From what it looks like, you don't have weights for this implementation that come from imagenet pretraining, do you?

opened by RaivoKoot 5

Releases(0.4.1)

0.4.1(Aug 25, 2021)

Source code(tar.gz)
Source code(zip)
0.4.0(Aug 16, 2021)

Source code(tar.gz)
Source code(zip)
0.3.3(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.3.2(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.3.1(Apr 25, 2021)

Source code(tar.gz)
Source code(zip)
0.2.1(Apr 21, 2021)

Source code(tar.gz)
Source code(zip)
0.1.1(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Mar 21, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Mar 18, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Feb 11, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Feb 11, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Feb 11, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1a(Feb 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub Repository

Generate Contextual Directory Wordlist For Target Org

PathPermutor Generate Contextual Directory Wordlist For Target Org This script generates contextual wordlist for any target org based on the set of UR

8 Jun 23, 2021

Research code for Arxiv paper "Camera Motion Agnostic 3D Human Pose Estimation"

GMR(Camera Motion Agnostic 3D Human Pose Estimation) This repo provides the source code of our arXiv paper: Seong Hyun Kim, Sunwon Jeong, Sungbum Park

1 Feb 07, 2022

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdh

22 Aug 27, 2022

Learning where to learn - Gradient sparsity in meta and continual learning

Learning where to learn - Gradient sparsity in meta and continual learning In this paper, we investigate gradient sparsity found by MAML in various co

28 Dec 09, 2022

Source code for "Roto-translated Local Coordinate Framesfor Interacting Dynamical Systems"

Roto-translated Local Coordinate Frames for Interacting Dynamical Systems Source code for Roto-translated Local Coordinate Frames for Interacting Dyna

19 Nov 27, 2022

Revisiting Global Statistics Aggregation for Improving Image Restoration

Revisiting Global Statistics Aggregation for Improving Image Restoration Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu Paper: https://arxiv.org/pd

128 Dec 24, 2022

Official PyTorch implementation of RIO

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-

17 May 20, 2022

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Combined Radiology and Pathology Classification MICCAI 2020 Combined Radiology a

22 Dec 08, 2022

Demystifying How Self-Supervised Features Improve Training from Noisy Labels

Demystifying How Self-Supervised Features Improve Training from Noisy Labels This code is a PyTorch implementation of the paper "[Demystifying How Sel

[email protected]"> 4 Oct 14, 2022

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

ISC-Track1-Submission The codes and related files to reproduce the results for Image Similarity Challenge Track 1. Required dependencies To begin with

115 Jan 02, 2023

Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

6 Sep 21, 2022

DumpSMBShare - A script to dump files and folders remotely from a Windows SMB share

DumpSMBShare A script to dump files and folders remotely from a Windows SMB shar

178 Jan 06, 2023

The official implementation of the Hybrid Self-Attention NEAT algorithm

PUREPLES - Pure Python Library for ES-HyperNEAT About This is a library of evolutionary algorithms with a focus on neuroevolution, implemented in pure

91 Dec 12, 2022

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

437 Dec 30, 2022

Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

Natural Posterior Network This repository provides the official implementation o

54 Dec 06, 2022

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

DECA: Detailed Expression Capture and Animation (SIGGRAPH2021) input image, aligned reconstruction, animation with various poses & expressions This is

1.5k Jan 02, 2023

My implementation of transformers related papers for computer vision in pytorch

vision_transformers This is my personnal repo to implement new transofrmers based and other computer vision DL models I am currenlty working without a

1 Nov 10, 2021

This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement This is the repository for the paper "Improving the Accuracy-Memory Trad

3 Dec 29, 2022

SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

81 Jun 30, 2022

[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

rl_portfolio This Repository uses Reinforcement Learning and Supervised learning to Optimize portfolio allocation. The goal is to make profitable agen

165 Dec 02, 2022

Implementation of TimeSformer, a pure attention-based solution for video classification

Related tags

Overview

TimeSformer - Pytorch

Install

Usage

Citations

Comments

How to deal with varying length video? Thanks

fix runtime error in SpaceTime Attention

Update timesformer_pytorch.py

Fine-tune with new datasets

problem in timesformer_pytorch.py

Imagenet Pretrained Weights

Releases(0.4.1)

0.4.1(Aug 25, 2021)

0.4.0(Aug 16, 2021)

0.3.3(Jul 4, 2021)

0.3.2(Apr 26, 2021)

0.3.1(Apr 25, 2021)

0.2.1(Apr 21, 2021)

0.1.1(Mar 23, 2021)

0.1.0(Mar 21, 2021)

0.0.5(Mar 18, 2021)

0.0.4(Feb 11, 2021)

0.0.3(Feb 11, 2021)

0.0.2(Feb 11, 2021)

0.0.1a(Feb 11, 2021)