Diverse Object-Scene Compositions For Zero-Shot Action Recognition

This repository contains the source code for the use of object-scene compositions for zero-shot action recognition.

This repository includes:

object and scene predictions for UCF-101, UCF-Sports, J-HMDB
script to retrieve object and scene predictions for Kinetics
scripts to obtain word and sentence embeddings for all datasets used and for object-scene compositions
script to obtain action predictions from any given action dataset, given the object and scene predictions and the respective action labels

Software used

python 3.8.8
pytorch 1.7.1
numpy 1.19.2
fasttext 0.9.2
sentence-transformers 1.2.0
scikit-learn 0.24.1

Downloading the object and scene predictions for Kinetics

While the action labels and video annotations for Kinetics are already present in the repo, the object and scene predictions need to be retrieved using:

bash kineticsdownload.sh

Obtaining word and sentence embeddings for all datasets

To compute the word and sentence embeddings for all the video and image datasets run:

python getfasttextembs.py; python getbertembs.py

This will additionally compute the embeddings for all object-scene compositions and the similarities between all action labels and objects-scene compositions.

Using the main script

The main script can be run using the default arguments as follows: To compute the word and sentence embeddings for all the video and image datasets run:

python zero-shot-actions.py

There are several flags that can be used. Descriptions for these can be shown by running:

python zero-shot-actions.py --help

Lastly, a helper function to compute results for different datasets and for different flag values is available:

python make_results.py

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Related tags

Overview

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Software used

Downloading the object and scene predictions for Kinetics

Obtaining word and sentence embeddings for all datasets

Using the main script

Owner

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

Deep motion generator collections

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Image Segmentation Animation using Quadtree concepts.

Using Machine Learning to Create High-Res Fine Art

MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Bootstrapped Representation Learning on Graphs

Deep Image Matting implementation in PyTorch

Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs

Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device" @ CAD&Graphics2019

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

Good Semi-Supervised Learning That Requires a Bad GAN

Try out deep learning models online on Google Colab

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias