A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Related tags

Deep Learningbrave
Overview

BraVe

This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

The model provided in this package was implemented based on the internal model that was used to compute results for the accompanying paper. It achieves comparable results on the evaluation tasks when evaluated side-by-side. Not all details are guaranteed to be identical though, and some results may differ from those given in the paper. In particular, this implementation does not provide the option to train with optical flow.

We provide a selection of pretrained checkpoints in the table below, which can directly be evaluated against HMDB 51 with the evaluation tools this package. These are exactly the checkpoints that were used to provide the numbers in the accompanying paper, and were not trained with the exact trainer given in this package. For details on training a model with this package, please see the end of this readme.

In the table below, the different configurations are represented by using e.g. V/A for video (narrow view) to audio (broad view), or V/F for a narrow view containing video, and a broad view containing optical flow.

The backbone in each case is TSMResnet, with a given width multiplier (please see the accompanying paper for further details). For all of the given numbers below, the SVM regularization constant used is 0.0001. For HMDB 51, the average is given in brackets, followed by the top-1 percentages for each of the splits.

Views Architecture HMDB51 UCF-101 K600 Trained with this package Checkpoint
V/AF TSM (1X) (69.2%) 71.307%, 68.497%, 67.843% 92.9% 69.2% download
V/AF TSM (2X) (69.9%) 72.157%, 68.432%, 69.02% 93.2% 70.2% download
V/A TSM (1X) (69.4%) 70.131%, 68.889%, 69.085% 93.0% 70.6% download
V/VVV TSM (1X) (65.4%) 66.797%, 63.856%, 65.425% 92.6% 70.8% download

Reproducing results from the paper

This package provides everything needed to evaluate the above checkpoints against HMDB 51. It supports Python 3.7 and above.

To get started, we recommend using a clean virtualenv. You may then install the brave package directly from GitHub using,

pip install git+https://github.com/deepmind/brave.git

A pre-processed version of the HMDB 51 dataset can be downloaded using the following command. It requires that both ffmpeg and unrar are available. The following will download the dataset to /tmp/hmdb51/, but any other location would also work.

  python -m brave.download_hmdb --output_dir /tmp/hmdb51/

To evaluate a checkpoint downloaded from the above table, the following may be used. The dataset shards arguments should be set to match the paths used above.

  python -m brave.evaluate_video_embeddings \
    --checkpoint_path <path/to/downloaded/checkpoint>.npy \
    --train_dataset_shards '/tmp/hmdb51/split_1/train/*' \
    --test_dataset_shards '/tmp/hmdb51/split_1/test/*' \
    --svm_regularization 0.0001 \
    --batch_size 8

Note that any of the three splits can be evaluated by changing the dataset split paths. To run this efficiently using a GPU, it is also necessary to install the correct version of jaxlib. To install jaxlib with support for cuda 10.1 on linux, the following install should be sufficient, though other precompiled packages may be found through the JAX documentation.

  pip install https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.69+cuda101-cp39-none-manylinux2010_x86_64.whl

Depending on the available GPU memory available, the batch_size parameter may be tuned to obtain better performance, or to reduce the required GPU memory.

Training a network

This package may also be used to train a model from scratch using jaxline. In order to try this, first ensure the configuration is set appropriately by modifying brave/config.py. At minimum, it would also be necessary to choose an appropriate global batch size (by default, the setting of 512 is likely too large for any single-machine training setup). In addition, a value must be set for dataset_shards. This should contain the paths of the tfrecord files containing the serialized training data.

For details on checkpointing and distributing computation, see the jaxline documentation.

Similarly to above, it is necessary to install the correct jaxlib package to enable training on a GPU.

The training may now be launched using,

  python -m brave.experiment --config=brave/config.py

Training datasets

This model is able to read data stored in the format specified by DMVR. For an example of writing training data in the correct format see the code in dataset/fixtures.py, which is used to write the test fixtures used in the tests for this package.

Running the tests

After checking out this code locally, you may run the package tests using

  pip install -e .
  pytest brave

We recommend doing this from a clean virtual environment.

Citing this work

If you use this code (or any derived code), data or these models in your work, please cite the relevant accompanying paper.

@misc{recasens2021broaden,
      title={Broaden Your Views for Self-Supervised Video Learning},
      author={Adrià Recasens and Pauline Luc and Jean-Baptiste Alayrac and Luyu Wang and Ross Hemsley and Florian Strub and Corentin Tallec and Mateusz Malinowski and Viorica Patraucean and Florent Altché and Michal Valko and Jean-Bastien Grill and Aäron van den Oord and Andrew Zisserman},
      year={2021},
      eprint={2103.16559},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Disclaimer

This is not an official Google product

Owner
DeepMind
DeepMind
Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models Code and supplementary materials Repository of the p

Daniel Bogdoll 4 Jul 13, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

Hang_Zhou 750 Dec 23, 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
NBEATSx: Neural basis expansion analysis with exogenous variables

NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c

Cristian Challu 100 Dec 31, 2022
TorchXRayVision: A library of chest X-ray datasets and models.

torchxrayvision A library for chest X-ray datasets and models. Including pre-trained models. ( 🎬 promo video about the project) Motivation: While the

Machine Learning and Medicine Lab 575 Jan 08, 2023
Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Code for running simulations for the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Lin

Matthew Farrell 1 Nov 22, 2022
Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

T-DNA Source code for the ACL-IJCNLP 2021 paper entitled Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adapta

shizhediao 17 Dec 22, 2022
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms

AdvancedHMC.jl AdvancedHMC.jl provides a robust, modular and efficient implementation of advanced HMC algorithms. An illustrative example for Advanced

The Turing Language 167 Jan 01, 2023
3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

3ds Ghidra Scripts These are ghidra scripts to help with 3ds reverse engineering

Zak 7 May 23, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
Crosslingual Segmental Language Model

Crosslingual Segmental Language Model This repository contains the code from Multilingual unsupervised sequence segmentation transfers to extremely lo

C.M. Downey 1 Jun 13, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

33 Jan 05, 2023
Memory efficient transducer loss computation

Introduction This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to re

Fangjun Kuang 51 Nov 25, 2022
Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

Ananya Gupta 1 Jan 15, 2022
CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

CCNet: Criss-Cross Attention for Semantic Segmentation Paper Links: Our most recent TPAMI version with improvements and extensions (Earlier ICCV versi

Zilong Huang 1.3k Dec 27, 2022