Syntax-Aware Action Targeting for Video Captioning

Related tags

Deep LearningSAAT
Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

  • input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
  • output/feature: extracted features of IRv2, C3D and Category embeddings
  • output/metadata: preprocessed annotations
  • output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

  • Pytorch implementation of CST
  • PyTorch implementation of SCST
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation"

1 Introduction Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation". The code s

Liang Zhang 10 Dec 10, 2022
FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

FaceExtraction FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction Occlusions often occur in face images in the wild, tr

16 Dec 14, 2022
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Emma Rocheteau 76 Dec 22, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
Multivariate Time Series Transformer, public version

Multivariate Time Series Transformer Framework This code corresponds to the paper: George Zerveas et al. A Transformer-based Framework for Multivariat

363 Jan 03, 2023
The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

The Ludii General Game System Ludii is a general game system being developed as part of the ERC-funded Digital Ludeme Project (DLP). This repository h

Digital Ludeme Project 50 Jan 04, 2023
Unsupervised Feature Ranking via Attribute Networks.

FRANe Unsupervised Feature Ranking via Attribute Networks (FRANe) converts a dataset into a network (graph) with nodes that correspond to the features

7 Sep 29, 2022
I explore rock vs. mine prediction using a SONAR dataset

I explore rock vs. mine prediction using a SONAR dataset. Using a Logistic Regression Model for my prediction algorithm, I intend on predicting what an object is based on supervised learning.

Jeff Shen 1 Jan 11, 2022
This repo provides the base code for pytorch-lightning and weight and biases simultaneous integration.

Write your model faster with pytorch-lightning-wadb-code-backbone This repository provides the base code for pytorch-lightning and weight and biases s

9 Mar 29, 2022
rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

rastrainer rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle. UI TODO Init UI. Add Block. Add l

deepbands 5 Mar 04, 2022
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 3.1k Jan 06, 2023
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

GT-SALT 36 Dec 02, 2022
Dashboard for the COVID19 spread

COVID-19 Data Explorer App A streamlit Dashboard for the COVID-19 spread. The app is live at: [https://covid19.cwerner.ai]. New data is queried from G

Christian Werner 22 Sep 29, 2022
Pretraining on Dynamic Graph Neural Networks

Pretraining on Dynamic Graph Neural Networks Our article is PT-DGNN and the code is modified based on GPT-GNN Requirements python 3.6 Ubuntu 18.04.5 L

7 Dec 17, 2022
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Phil Wang 784 Jan 06, 2023
Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation Source code repo for paper Zero-Shot Information Extraction as a Unified Text

cgraywang 88 Dec 31, 2022
Make a surveillance camera from your raspberry pi!

rpi-surveillance Make a surveillance camera from your Raspberry Pi 4! The surveillance is built as following: the camera records 10 seconds video and

Vladyslav 62 Feb 03, 2022
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp

Sheng Yang 10 Dec 05, 2022