Code release for ICCV 2021 paper "Anticipative Video Transformer"

Related tags

Deep LearningAVT
Overview

Anticipative Video Transformer

Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT)

PWC
PWC
PWC
PWC

[project page] [paper]

If this code helps with your work, please cite:

R. Girdhar and K. Grauman. Anticipative Video Transformer. IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

@inproceedings{girdhar2021anticipative,
    title = {{Anticipative Video Transformer}},
    author = {Girdhar, Rohit and Grauman, Kristen},
    booktitle = {ICCV},
    year = 2021
}

Installation

The code was tested on a Ubuntu 20.04 cluster with each server consisting of 8 V100 16GB GPUs.

First clone the repo and set up the required packages in a conda environment. You might need to make minor modifications here if some packages are no longer available. In most cases they should be replaceable by more recent versions.

$ git clone --recursive [email protected]:facebookresearch/AVT.git
$ conda env create -f env.yaml python=3.7.7
$ conda activate avt

Set up RULSTM codebase

If you plan to use EPIC-Kitchens datasets, you might need the train/test splits and evaluation code from RULSTM. This is also needed if you want to extract RULSTM predictions for test submissions.

$ cd external
$ git clone [email protected]:fpv-iplab/rulstm.git; cd rulstm
$ git checkout 57842b27d6264318be2cb0beb9e2f8c2819ad9bc
$ cd ../..

Datasets

The code expects the data in the DATA/ folder. You can also symlink it to a different folder on a faster/larger drive. Inside it will contain following folders:

  1. videos/ which will contain raw videos
  2. external/ which will contain pre-extracted features from prior work
  3. extracted_features/ which will contain other extracted features
  4. pretrained/ which contains pretrained models, eg from TIMM

The paths to these datasets are set in files like conf/dataset/epic_kitchens100/common.yaml so you can also update the paths there instead.

EPIC-Kitchens

To train only the AVT-h on top of pre-extracted features, you can download the features from RULSTM into DATA/external/rulstm/RULSTM/data_full for EK55 and DATA/external/rulstm/RULSTM/ek100_data_full for EK100. If you plan to train models on features extracted from a irCSN-152 model finetuned from IG65M features, you can download our pre-extracted features from here into DATA/extracted_features/ek100/ig65m_ftEk100_logits_10fps1s/rgb/ or here into DATA/extracted_features/ek55/ig65m_ftEk55train_logits_25fps/rgb/.

To train AVT end-to-end, you need to download the raw videos from EPIC-Kitchens. They can be organized as you wish, but this is how my folders are organized (since I first downloaded EK55 and then the remaining new videos for EK100):

DATA
├── videos
│   ├── EpicKitchens
│   │   └── videos_ht256px
│   │       ├── train
│   │       │   ├── P01
│   │       │   │   ├── P01_01.MP4
│   │       │   │   ├── P01_03.MP4
│   │       │   │   ├── ...
│   │       └── test
│   │           ├── P01
│   │           │   ├── P01_11.MP4
│   │           │   ├── P01_12.MP4
│   │           │   ├── ...
│   │           ...
│   ├── EpicKitchens100
│   │   └── videos_extension_ht256px
│   │       ├── P01
│   │       │   ├── P01_101.MP4
│   │       │   ├── P01_102.MP4
│   │       │   ├── ...
│   │       ...
│   ├── EGTEA/101020/videos/
│   │   ├── OP01-R01-PastaSalad.mp4
│   │   ...
│   └── 50Salads/rgb/
│       ├── rgb-01-1.avi
│       ...
├── external
│   └── rulstm
│       └── RULSTM
│           ├── egtea
│           │   ├── TSN-C_3_egtea_action_CE_flow_model_best_fcfull_hd
│           │   ...
│           ├── data_full  # (EK55)
│           │   ├── rgb
│           │   ├── obj
│           │   └── flow
│           └── ek100_data_full
│               ├── rgb
│               ├── obj
│               └── flow
└── extracted_features
    ├── ek100
    │   └── ig65m_ftEk100_logits_10fps1s
    │       └── rgb
    └── ek55
        └── ig65m_ftEk55train_logits_25fps
            └── rgb

If you use a different organization, you would need to edit the train/val dataset files, such as conf/dataset/epic_kitchens100/anticipation_train.yaml. Sometimes the values are overriden in the TXT config files, so might need to change there too. The root property takes a list of folders where the videos can be found, and it will search through all of them in order for a given video. Note that we resized the EPIC videos to 256px height for faster processing; you can use sample_scripts/resize_epic_256px.sh script for the same.

Please see docs/DATASETS.md for setting up other datasets.

Training and evaluating models

If you want to train AVT models, you would need pre-trained models from timm. We have experiments that use the following models:

$ mkdir DATA/pretrained/TIMM/
$ wget https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_patch16_224_in21k-e5005f0a.pth -O DATA/pretrained/TIMM/jx_vit_base_patch16_224_in21k-e5005f0a.pth
$ wget https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth -O DATA/pretrained/TIMM/jx_vit_base_p16_224-80ecf9dd.pth

The code uses hydra 1.0 for configuration with submitit plugin for jobs via SLURM. We provide a launch.py script that is a wrapper around the training scripts and can run jobs locally or launch distributed jobs. The configuration overrides for a specific experiment is defined by a TXT file. You can run a config by:

$ python launch.py -c expts/01_ek100_avt.txt

where expts/01_ek100_avt.txt can be replaced by any TXT config file.

By default, the launcher will launch the job to a SLURM cluster. However, you can run it locally using one of the following options:

  1. -g to run locally in debug mode with 1 GPU and 0 workers. Will allow you to place pdb.set_trace() to debug interactively.
  2. -l to run locally using as many GPUs on the local machine.

This will run the training, which will run validation every few epochs. You can also only run testing using the -t flag.

The outputs will be stored in OUTPUTS/<path to config>. This would include tensorboard files that you can use to visualize the training progress.

Model Zoo

EPIC-Kitchens-100

Backbone Head Class-mean
[email protected] (Actions)
Config Model
AVT-b (IN21K) AVT-h 14.9 expts/01_ek100_avt.txt link
TSN (RGB) AVT-h 13.6 expts/02_ek100_avt_tsn.txt link
TSN (Obj) AVT-h 8.7 expts/03_ek100_avt_tsn_obj.txt link
irCSN152 (IG65M) AVT-h 12.8 expts/04_ek100_avt_ig65m.txt link

Late fusing predictions

For comparison to methods that use multiple modalities, you can late fuse predictions from multiple models using functions from notebooks/utils.py. For example, to compute the late fused performance reported in Table 3 (val) as AVT+ (obtains 15.9 [email protected] for actions):

from notebooks.utils import *
CFG_FILES = [
    ('expts/01_ek100_avt.txt', 0),
    ('expts/03_ek100_avt_tsn_obj.txt', 0),
]
WTS = [2.5, 0.5]
print_accuracies_epic(get_epic_marginalize_late_fuse(CFG_FILES, weights=WTS)[0])

Please see docs/MODELS.md for test submission and models on other datasets.

License

This codebase is released under the license terms specified in the LICENSE file. Any imported libraries, datasets or other code follows the license terms set by respective authors.

Acknowledgements

The codebase was built on top of facebookresearch/VMZ. Many thanks to Antonino Furnari, Fadime Sener and Miao Liu for help with prior work.

Owner
Facebook Research
Facebook Research
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Noam Eshed 34 Oct 02, 2022
An AI Assistant More Than a Toolkit

tymon An AI Assistant More Than a Toolkit The reason for creating framework tymon is simple. making AI more like an assistant, helping us to complete

TymonXie 46 Oct 24, 2022
A curated list of awesome game datasets, and tools to artificial intelligence in games

🎮 Awesome Game Datasets In computer science, Artificial Intelligence (AI) is intelligence demonstrated by machines. Its definition, AI research as th

Leonardo Mauro 454 Jan 03, 2023
TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

YeongHyeon Park 7 Aug 28, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
A simple interface for editing natural photos with generative neural networks.

Neural Photo Editor A simple interface for editing natural photos with generative neural networks. This repository contains code for the paper "Neural

Andy Brock 2.1k Dec 29, 2022
The Submission for SIMMC 2.0 Challenge 2021

The Submission for SIMMC 2.0 Challenge 2021 challenge website Requirements python 3.8.8 pytorch 1.8.1 transformers 4.8.2 apex for multi-gpu nltk Prepr

5 Jul 26, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 08, 2023
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
iNAS: Integral NAS for Device-Aware Salient Object Detection

iNAS: Integral NAS for Device-Aware Salient Object Detection Introduction Integral search design (jointly consider backbone/head structures, design/de

顾宇超 77 Dec 02, 2022
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

RE results graph visualization and company clustering Installation pip install -r requirements.txt python -m nltk.downloader stopwords python3.7 main.

Jieun Han 1 Oct 06, 2022
Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion

Feature-Style Encoder for Style-Based GAN Inversion Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion. Code will

InterDigital 63 Jan 03, 2023
Api's bulid in Flask perfom to manage Todo Task.

Citymall-task Api's bulid in Flask perfom to manage Todo Task. Installation Requrements : Python: 3.10.0 MongoDB create .env file with variables DB_UR

Aisha Tayyaba 1 Dec 17, 2021
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022
Relative Human dataset, CVPR 2022

Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including: Depth layers (DLs): relative depth relationsh

Yu Sun 112 Dec 02, 2022
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

This is the official PyTorch implementation of the ALBEF paper [Blog]. This repository supports pre-training on custom datasets, as well as finetuning on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on

Salesforce 805 Jan 09, 2023
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022