Code release for ICCV 2021 paper "Anticipative Video Transformer"

Related tags

Deep LearningAVT
Overview

Anticipative Video Transformer

Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT)

PWC
PWC
PWC
PWC

[project page] [paper]

If this code helps with your work, please cite:

R. Girdhar and K. Grauman. Anticipative Video Transformer. IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

@inproceedings{girdhar2021anticipative,
    title = {{Anticipative Video Transformer}},
    author = {Girdhar, Rohit and Grauman, Kristen},
    booktitle = {ICCV},
    year = 2021
}

Installation

The code was tested on a Ubuntu 20.04 cluster with each server consisting of 8 V100 16GB GPUs.

First clone the repo and set up the required packages in a conda environment. You might need to make minor modifications here if some packages are no longer available. In most cases they should be replaceable by more recent versions.

$ git clone --recursive [email protected]:facebookresearch/AVT.git
$ conda env create -f env.yaml python=3.7.7
$ conda activate avt

Set up RULSTM codebase

If you plan to use EPIC-Kitchens datasets, you might need the train/test splits and evaluation code from RULSTM. This is also needed if you want to extract RULSTM predictions for test submissions.

$ cd external
$ git clone [email protected]:fpv-iplab/rulstm.git; cd rulstm
$ git checkout 57842b27d6264318be2cb0beb9e2f8c2819ad9bc
$ cd ../..

Datasets

The code expects the data in the DATA/ folder. You can also symlink it to a different folder on a faster/larger drive. Inside it will contain following folders:

  1. videos/ which will contain raw videos
  2. external/ which will contain pre-extracted features from prior work
  3. extracted_features/ which will contain other extracted features
  4. pretrained/ which contains pretrained models, eg from TIMM

The paths to these datasets are set in files like conf/dataset/epic_kitchens100/common.yaml so you can also update the paths there instead.

EPIC-Kitchens

To train only the AVT-h on top of pre-extracted features, you can download the features from RULSTM into DATA/external/rulstm/RULSTM/data_full for EK55 and DATA/external/rulstm/RULSTM/ek100_data_full for EK100. If you plan to train models on features extracted from a irCSN-152 model finetuned from IG65M features, you can download our pre-extracted features from here into DATA/extracted_features/ek100/ig65m_ftEk100_logits_10fps1s/rgb/ or here into DATA/extracted_features/ek55/ig65m_ftEk55train_logits_25fps/rgb/.

To train AVT end-to-end, you need to download the raw videos from EPIC-Kitchens. They can be organized as you wish, but this is how my folders are organized (since I first downloaded EK55 and then the remaining new videos for EK100):

DATA
├── videos
│   ├── EpicKitchens
│   │   └── videos_ht256px
│   │       ├── train
│   │       │   ├── P01
│   │       │   │   ├── P01_01.MP4
│   │       │   │   ├── P01_03.MP4
│   │       │   │   ├── ...
│   │       └── test
│   │           ├── P01
│   │           │   ├── P01_11.MP4
│   │           │   ├── P01_12.MP4
│   │           │   ├── ...
│   │           ...
│   ├── EpicKitchens100
│   │   └── videos_extension_ht256px
│   │       ├── P01
│   │       │   ├── P01_101.MP4
│   │       │   ├── P01_102.MP4
│   │       │   ├── ...
│   │       ...
│   ├── EGTEA/101020/videos/
│   │   ├── OP01-R01-PastaSalad.mp4
│   │   ...
│   └── 50Salads/rgb/
│       ├── rgb-01-1.avi
│       ...
├── external
│   └── rulstm
│       └── RULSTM
│           ├── egtea
│           │   ├── TSN-C_3_egtea_action_CE_flow_model_best_fcfull_hd
│           │   ...
│           ├── data_full  # (EK55)
│           │   ├── rgb
│           │   ├── obj
│           │   └── flow
│           └── ek100_data_full
│               ├── rgb
│               ├── obj
│               └── flow
└── extracted_features
    ├── ek100
    │   └── ig65m_ftEk100_logits_10fps1s
    │       └── rgb
    └── ek55
        └── ig65m_ftEk55train_logits_25fps
            └── rgb

If you use a different organization, you would need to edit the train/val dataset files, such as conf/dataset/epic_kitchens100/anticipation_train.yaml. Sometimes the values are overriden in the TXT config files, so might need to change there too. The root property takes a list of folders where the videos can be found, and it will search through all of them in order for a given video. Note that we resized the EPIC videos to 256px height for faster processing; you can use sample_scripts/resize_epic_256px.sh script for the same.

Please see docs/DATASETS.md for setting up other datasets.

Training and evaluating models

If you want to train AVT models, you would need pre-trained models from timm. We have experiments that use the following models:

$ mkdir DATA/pretrained/TIMM/
$ wget https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_patch16_224_in21k-e5005f0a.pth -O DATA/pretrained/TIMM/jx_vit_base_patch16_224_in21k-e5005f0a.pth
$ wget https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth -O DATA/pretrained/TIMM/jx_vit_base_p16_224-80ecf9dd.pth

The code uses hydra 1.0 for configuration with submitit plugin for jobs via SLURM. We provide a launch.py script that is a wrapper around the training scripts and can run jobs locally or launch distributed jobs. The configuration overrides for a specific experiment is defined by a TXT file. You can run a config by:

$ python launch.py -c expts/01_ek100_avt.txt

where expts/01_ek100_avt.txt can be replaced by any TXT config file.

By default, the launcher will launch the job to a SLURM cluster. However, you can run it locally using one of the following options:

  1. -g to run locally in debug mode with 1 GPU and 0 workers. Will allow you to place pdb.set_trace() to debug interactively.
  2. -l to run locally using as many GPUs on the local machine.

This will run the training, which will run validation every few epochs. You can also only run testing using the -t flag.

The outputs will be stored in OUTPUTS/<path to config>. This would include tensorboard files that you can use to visualize the training progress.

Model Zoo

EPIC-Kitchens-100

Backbone Head Class-mean
[email protected] (Actions)
Config Model
AVT-b (IN21K) AVT-h 14.9 expts/01_ek100_avt.txt link
TSN (RGB) AVT-h 13.6 expts/02_ek100_avt_tsn.txt link
TSN (Obj) AVT-h 8.7 expts/03_ek100_avt_tsn_obj.txt link
irCSN152 (IG65M) AVT-h 12.8 expts/04_ek100_avt_ig65m.txt link

Late fusing predictions

For comparison to methods that use multiple modalities, you can late fuse predictions from multiple models using functions from notebooks/utils.py. For example, to compute the late fused performance reported in Table 3 (val) as AVT+ (obtains 15.9 [email protected] for actions):

from notebooks.utils import *
CFG_FILES = [
    ('expts/01_ek100_avt.txt', 0),
    ('expts/03_ek100_avt_tsn_obj.txt', 0),
]
WTS = [2.5, 0.5]
print_accuracies_epic(get_epic_marginalize_late_fuse(CFG_FILES, weights=WTS)[0])

Please see docs/MODELS.md for test submission and models on other datasets.

License

This codebase is released under the license terms specified in the LICENSE file. Any imported libraries, datasets or other code follows the license terms set by respective authors.

Acknowledgements

The codebase was built on top of facebookresearch/VMZ. Many thanks to Antonino Furnari, Fadime Sener and Miao Liu for help with prior work.

Owner
Facebook Research
Facebook Research
Subgraph Based Learning of Contextual Embedding

SLiCE Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks Dataset details: We use four public benchmark da

Pacific Northwest National Laboratory 27 Dec 01, 2022
Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Syne Tune This package provides state-of-the-art distributed hyperparameter optimizers (HPO) where trials can be evaluated with several backend option

Amazon Web Services - Labs 236 Jan 01, 2023
Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Philipp Erler 329 Jan 06, 2023
An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Unbiased Learning to Rank Algorithms (ULTRA) This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiment

back 3 Nov 18, 2022
PyTorch Personal Trainer: My framework for deep learning experiments

Alex's PyTorch Personal Trainer (ptpt) (name subject to change) This repository contains my personal lightweight framework for deep learning projects

Alex McKinney 8 Jul 14, 2022
Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

README Repository containing the code for the paper "Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions". Specifically, an

Yousef Emam 13 Nov 24, 2022
RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation RL-GAN is an official implementation of the paper: T

42 Nov 10, 2022
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN Pytorch implementation Inception score evaluation StackGAN-v2-pytorch Tensorflow implementation for reproducing main results in the paper Sta

Han Zhang 1.8k Dec 21, 2022
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
Implementation of Bottleneck Transformer in Pytorch

Bottleneck Transformer - Pytorch Implementation of Bottleneck Transformer, SotA visual recognition model with convolution + attention that outperforms

Phil Wang 621 Jan 06, 2023
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Marco Cerliani 212 Dec 30, 2022
Production First and Production Ready End-to-End Speech Recognition Toolkit

WeNet 中文版 Discussions | Docs | Papers | Runtime (x86) | Runtime (android) | Pretrained Models We share neural Net together. The main motivation of WeN

2.7k Jan 04, 2023
PyTorch implementation of Convolutional Neural Fabrics http://arxiv.org/abs/1606.02492

PyTorch implementation of Convolutional Neural Fabrics arxiv:1606.02492 There are some minor differences: The raw image is first convolved, to obtain

Anuvabh Dutt 25 Dec 22, 2021
Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

CARscan- Approach 1 - Segmentation of images by detecting contours. It failed because in images with elements along with cars were also getting detect

Padmanabha Banerjee 5 Jul 29, 2021
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 04, 2023
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

Princeton Natural Language Processing 119 Sep 28, 2022
Sibur challange 2021 competition - 6 place

sibur challange 2021 Решение на 6 место: https://sibur.ai-community.com/competitions/5/tasks/13 Скор 1.4066/1.4159 public/private. Архитектура - однос

Ivan 5 Jan 11, 2022
Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Meng Liu 2 Jul 19, 2022