This is official implementaion of paper "Token Shift Transformer for Video Classification".

Overview

TokShift-Transformer

This is official implementaion of paper "Token Shift Transformer for Video Classification". We achieve SOTA performance 80.40% on Kinetics-400 val. Paper link

Updates

July 11, 2021

  • Release this V1 version (the version used in paper) to public.
  • we are preparing a V2 version which include the following modifications, will release within 1 week:
  1. Directly decode video mp4 file during training/evaluation
  2. Change to adopt standarlize timm code-base.
  3. Performances are further improved than reported in paper version (average +0.5).

April 22, 2021

  • Add Train/Test guidline and Data perpariation

April 16, 2021

  • Publish TokShift Transformer for video content understanding

Model Zoo and Baselines

architecture backbone pretrain Res & Frames GFLOPs x views top1 config
ViT (Video) Base16 ImgNet21k 224 & 8 134.7 x 30 76.02 link k400_vit_8x32_224.yml
TokShift Base-16 ImgNet21k 224 & 8 134.7 x 30 77.28 link k400_tokshift_div4_8x32_base_224.yml
TokShift (MR) Base16 ImgNet21k 256 & 8 175.8 x 30 77.68 link k400_tokshift_div4_8x32_base_256.yml
TokShift (HR) Base16 ImgNet21k 384 & 8 394.7 x 30 78.14 link k400_tokshift_div4_8x32_base_384.yml
TokShift Base16 ImgNet21k 224 & 16 268.5 x 30 78.18 link k400_tokshift_div4_16x32_base_224.yml
TokShift-Large (HR) Large16 ImgNet21k 384 & 8 1397.6 x 30 79.83 link k400_tokshift_div4_8x32_large_384.yml
TokShift-Large (HR) Large16 ImgNet21k 384 & 12 2096.4 x 30 80.40 link k400_tokshift_div4_12x32_large_384.yml

Below is trainig log, we use 3 views evaluation (instead of 30 views) during validation for time-saving.

Installation

  • PyTorch >= 1.7, torchvision
  • tensorboardx

Quick Start

Train

  1. Download ImageNet-22k pretrained weights from Base16 and Large16.
  2. Prepare Kinetics-400 dataset organized in the following structure, trainValTest
k400
|_ frames331_train
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ frames331_val
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ trainValTest
   |_ train.txt
   |_ val.txt
  1. Using train-script (train.sh) to train k400
#!/usr/bin/env python
import os

cmd = "python -u main_ddp_shift_v3.py \
		--multiprocessing-distributed --world-size 1 --rank 0 \
		--dist-ur tcp://127.0.0.1:23677 \
		--tune_from pretrain/ViT-L_16_Img21.npz \
		--cfg config/custom/kinetics400/k400_tokshift_div4_12x32_large_384.yml"
os.system(cmd)

Test

Using test.sh (test.sh) to evaluate k400

#!/usr/bin/env python
import os
cmd = "python -u main_ddp_shift_v3.py \
        --multiprocessing-distributed --world-size 1 --rank 0 \
        --dist-ur tcp://127.0.0.1:23677 \
        --evaluate \
        --resume model_zoo/ViT-B_16_k400_dense_cls400_segs8x32_e18_lr0.1_B21_VAL224/best_vit_B8x32x224_k400.pth \
        --cfg config/custom/kinetics400/k400_vit_8x32_224.yml"
os.system(cmd)

Contributors

VideoNet is written and maintained by Dr. Hao Zhang and Dr. Yanbin Hao.

Citing

If you find TokShift-xfmr is useful in your research, please use the following BibTeX entry for citation.

@article{tokshift2021,
  title={Token Shift Transformer for Video Classification},
  author={Hao Zhang, Yanbin Hao, Chong-Wah Ngo},
  journal={ACM Multimedia 2021},
}

Acknowledgement

Thanks for the following Github projects:

Owner
VideoNet
VideoNet
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

Exploring the Limits of Out-of-Distribution Detection In this repository we're collecting replications for the key experiments in the Exploring the Li

Stanislav Fort 35 Jan 03, 2023
A Self-Supervised Contrastive Learning Framework for Aspect Detection

AspDecSSCL A Self-Supervised Contrastive Learning Framework for Aspect Detection This repository is a pytorch implementation for the following AAAI'21

Tian Shi 30 Dec 28, 2022
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Code data_preprocess.py: preprocess data for Dependent-T5. parameters.py: define parameters of Dependent-T5. train_tools.py: traning and evaluation co

1 Apr 21, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 04, 2023
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
CMP 414/765 course repository for Spring 2022 semester

CMP414/765: Artificial Intelligence Spring2021 This is the GitHub repository for course CMP 414/765: Artificial Intelligence taught at The City Univer

ch00226855 4 May 16, 2022
A port of muP to JAX/Haiku

MUP for Haiku This is a (very preliminary) port of Yang and Hu et al.'s μP repo to Haiku and JAX. It's not feature complete, and I'm very open to sugg

18 Dec 30, 2022
Learning Skeletal Articulations with Neural Blend Shapes

This repository provides an end-to-end library for automatic character rigging and blend shapes generation as well as a visualization tool. It is based on our work Learning Skeletal Articulations wit

Peizhuo 504 Dec 30, 2022
A Tensorfflow implementation of Attend, Infer, Repeat

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models This is an unofficial Tensorflow implementation of Attend, Infear, Repeat (AIR)

Adam Kosiorek 82 May 27, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
Pytorch implementation of Zero-DCE++

Zero-DCE++ You can find more details here: https://li-chongyi.github.io/Proj_Zero-DCE++.html. You can find the details of our CVPR version: https://li

Chongyi Li 157 Dec 23, 2022
code and models for "Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation"

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation This repository contains code and models for the method described in: Golnaz

55 Jun 18, 2022
An implementation of Deep Graph Infomax (DGI) in PyTorch

DGI Deep Graph Infomax (Veličković et al., ICLR 2019): https://arxiv.org/abs/1809.10341 Overview Here we provide an implementation of Deep Graph Infom

Petar Veličković 491 Jan 03, 2023
Vehicle direction identification consists of three module detection , tracking and direction recognization.

Vehicle-direction-identification Vehicle direction identification consists of three module detection , tracking and direction recognization. Algorithm

5 Nov 15, 2022
A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

Aboleth A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation [1] with stochastic gradient variational Bayes

Gradient Institute 127 Dec 12, 2022