Syntax-Aware Action Targeting for Video Captioning

Last update: Oct 13, 2022

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

Python 3.6
Pytorch 1.1
CUDA 10.0
Microsoft COCO Caption Evaluation
CIDEr

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
output/feature: extracted features of IRv2, C3D and Category embeddings
output/metadata: preprocessed annotations
output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

Pytorch implementation of CST
PyTorch implementation of SCST

Syntax-Aware Action Targeting for Video Captioning

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Dependencies

Data

Test

Train

Citation

Acknowledgements

Owner

PyTorch experiments with the Zalando fashion-mnist dataset

This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

Multivariate Time Series Transformer, public version

The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

Unsupervised Feature Ranking via Attribute Networks.

I explore rock vs. mine prediction using a SONAR dataset

This repo provides the base code for pytorch-lightning and weight and biases simultaneous integration.

rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

Accelerated deep learning R&D

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

Dashboard for the COVID19 spread

Pretraining on Dynamic Graph Neural Networks

An All-MLP solution for Vision, from Google AI

Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Make a surveillance camera from your raspberry pi!

Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization