Syntax-Aware Action Targeting for Video Captioning

Last update: Oct 13, 2022

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

Python 3.6
Pytorch 1.1
CUDA 10.0
Microsoft COCO Caption Evaluation
CIDEr

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
output/feature: extracted features of IRv2, C3D and Category embeddings
output/metadata: preprocessed annotations
output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

Pytorch implementation of CST
PyTorch implementation of SCST

Syntax-Aware Action Targeting for Video Captioning

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Dependencies

Data

Test

Train

Citation

Acknowledgements

Owner

Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

CVPR2020 Counterfactual Samples Synthesizing for Robust VQA

Agent-based model simulator for air quality and pandemic risk assessment in architectural spaces

Semiconductor Machine learning project

All supplementary material used by me while TA-ing CS3244: Machine Learning

Demos of essentia classifiers hosted on replicate.ai

Image De-raining Using a Conditional Generative Adversarial Network

Model parallel transformers in Jax and Haiku

Analysing poker data from home games with friends

RGB-D Local Implicit Function for Depth Completion of Transparent Objects

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

Springer Link Download Module for Python

RefineMask (CVPR 2021)

Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

A Learning-based Camera Calibration Toolbox

Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection