Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Overview

merlot_reserve

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

MERLOT Reserve (in submission) is a model for learning joint representations of vision, language, and sound from YouTube. The learned model can be used in a zero-shot or finetuned setting, where it does well on tasks like VCR and TVQA.

Visit our project page at rowanzellers.com/merlotreserve or read the full paper to learn more.

What's here

We are releasing the following:

  • JAX code, and model checkpoints, for the MERLOT model
  • Code for pretraining the model
  • Code for finetuning the model on VCR and TVQA
  • Code for doing zero-shot inference with the model

Environment and setup

There are two different ways to run MERLOT Reserve:

  • Pretraining on videos You'll need a TPU Pod VM for this. This step shouldn't be necessary for most people, as we have released model checkpoints.
  • Finetuning on VCR or TVQA I've done this on a TPU v3-8 VM. This should be possible on GPU(s), but I haven't tested this on such hardware.
  • Zero-shot inference I've ran this on a GPU (even an older, Titan X from 2016 works.)

Installation on a GPU Machine

Install Cuda 11.4 (I used this link) and CUDNN 8.2. You might have to add something like this to your PATH:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Create the environment:

conda create --name mreserve python=3.8 && conda activate mreserve
conda install -y python=3.8 tqdm numpy pyyaml scipy ipython cython typing h5py pandas matplotlib

# Install jax
pip install jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_releases.html
# If doing this on TPUs instead of locally...
# pip install "jax[tpu]>=0.2.18" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

# This is needed sometimes https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp
pip uninstall numpy
pip install numpy==1.19.5

pip install -r requirements.txt

You can then try out the interactive script at demo/demo_video.py. It will handle downloading the model checkpoint for you.

Installation on a Cloud TPU VM

See the instructions in pretrain/ to set up your environment on a TPU v3-8 VM.

Checkpoints

These should get auto-downloaded if you use PretrainedMerlotReserve in mreserve/modeling.py. All are flax checkpoint files:

# pretrained checkpoints
gs://merlotreserve/ckpts/base
gs://merlotreserve/ckpts/base_resadapt
gs://merlotreserve/ckpts/large
gs://merlotreserve/ckpts/large_resadapt

# finetuned checkpoints
gs://merlotreserve/vcr_ckpts/vcr_finetune_base
gs://merlotreserve/vcr_ckpts/vcr_finetune_large

gs://merlotreserve/tvqa_ckpts/tvqa_finetune_base
gs://merlotreserve/tvqa_ckpts/tvqa_finetune_large

# TVQA Data
gs://merlotreserve/finetune_data/tvqa/

# VCR data
gs://merlotreserve/finetune_data/vcr/
Owner
Rowan Zellers
Rowan Zellers
Tutorial for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop

Workshop Advantech Jetson Nano This tutorial has been designed for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop in collaboration with Adva

Edge Impulse 18 Nov 22, 2022
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021)

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi Code

Subhabrata Choudhury 18 Dec 27, 2022
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 05, 2023
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 06, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
Contains code for the paper "Vision Transformers are Robust Learners".

Vision Transformers are Robust Learners This repository contains the code for the paper Vision Transformers are Robust Learners by Sayak Paul* and Pin

Sayak Paul 103 Jan 05, 2023
Heart Arrhythmia Classification

This program takes and input of an ECG in European Data Format (EDF) and outputs the classification for heartbeats into normal vs different types of arrhythmia . It uses a deep learning model for cla

4 Nov 02, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022
Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

🔍 Watermarking Images in Self-Supervised Latent-Spaces PyTorch implementation and pretrained models for the paper. For details, see Watermarking Imag

Meta Research 32 Dec 13, 2022
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval - ICCV2021

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval Pytorch implementation of SPQ Accepted to ICCV 2021 - paper Young Kyun Jang

Young Kyun Jang 71 Dec 27, 2022
This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems. The main directory include the code

0 Dec 23, 2021
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

Badsha Laskar 3 Oct 03, 2022
Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis

Introduction This is an implementation of our paper Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis.

24 Dec 06, 2022
Self-Adaptable Point Processes with Nonparametric Time Decays

NPPDecay This is our implementation for the paper Self-Adaptable Point Processes with Nonparametric Time Decays, by Zhimeng Pan, Zheng Wang, Jeff M. P

zpan 2 Sep 24, 2022
Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021. Figure 1: In the process of motion capture (mocap), some joints or even the whole human

Shinny cui 3 Oct 31, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
Implementation of TimeSformer, a pure attention-based solution for video classification

TimeSformer - Pytorch Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification.

Phil Wang 602 Jan 03, 2023
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

MIND 478 Jan 01, 2023