Multi-query Video Retrieval

This repository contains the code for the paper:

@misc{wang2022multiquery,
      title={Multi-query Video Retrieval}, 
      author={Zeyu Wang and Yu Wu and Karthik Narasimhan and Olga Russakovsky},
      year={2022},
      eprint={2201.03639},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Data Preparation

Download raw videos for MSR-VTT, MSVD and VATEX, and put them into data/{dataset}/raw_videos folder.
Run the script data/extract_frames.sh to extract frames from raw videos.

The resulting data folder structures like this:

├── data
    ├── msrvtt
        ├── msrvtt_train.json
        ├── msrvtt_test.json
        ├── msrvtt_test_varying_query_sample_1-20.json
        ├── raw_videos
            ├── video0.mp4
            ├── ...
        ├── extracted_frames
            ├── video0.mp4
                ├── 0.jpg
                ├── ...
            ├── ...
    ├── msvd
        ├── ...
    ├── vatex
        ├── ...

For Frozen model, download the pretrained checkpoint provided by the original authors here, and put into record/pretrained folder.

Training

Run command: python train.py -c configs/{config_path}

Evaluation

Run command: python evaluate.py -c configs/{config_path}

Acknowledgements

The structure of this repository is based on https://github.com/victoresque/pytorch-template. Some of the code are adpated from https://github.com/m-bain/frozen-in-time and https://github.com/ArrowLuo/CLIP4Clip.

Multi-query Video Retreival

Related tags

Overview

Multi-query Video Retrieval

Data Preparation

Training

Evaluation

Acknowledgements

Owner

Princeton Visual AI Lab

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

A python package for generating, analyzing and visualizing building shadows

Wenet STT Python

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Code and data for "TURL: Table Understanding through Representation Learning"

Official implementation of the paper Do pedestrians pay attention? Eye contact detection for autonomous driving

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Official Implementation of Neural Splines

An open source object detection toolbox based on PyTorch

NPBG++: Accelerating Neural Point-Based Graphics

Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neurons learned with Gradient descent or LeLevenberg–Marquardt algorithm

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Learning trajectory representations using self-supervision and programmatic supervision.

Robust Self-augmentation for NER with Meta-reweighting

Prototype-based Incremental Few-Shot Semantic Segmentation

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective

Code and description for my BSc Project, September 2021