Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Overview

WSDEC

This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos.

Description

Repo directories

  • ./: global config files, training, evaluating scripts;
  • ./data: data dictionary;
  • ./model: our final models used to reproduce the results;
  • ./runs: the default output dictionary used to store our trained model and result files;
  • ./scripts: helper scripts;
  • ./third_party: third party dependency include the official evaluation scripts;
  • ./utils: helper functions;
  • ./train_script: all training scripts;
  • ./eval_script: all evalulating scripts.

Dependency

  • Python 2.7
  • CUDA 9.0(note: you will encounter a bug saying segmentation fault(core dump) if you run our code with CUDA 8.0)
    • But it seems that the bug still exists. See issue
  • [Pytorch 0.3.1](note: 0.3.1 is not compatible with newer version)
  • numpy, hdf5 and other necessary packages(no special requirement)

Usage for reproduction

Before we start

Before the training and testing, we should make sure the data, third party data are prepared, here is the one-by-one steps to make everything prepared.

1. Clone our repo and submodules

git clone --recursive https://github.com/XgDuan/WSDEC

2. Download all the data

  • Download the official C3D features, you can either download the data from the website or from our onedrive cloud.

    • Download from the official website; (Note, after you download the C3D features, you can either place it in the data folder and rename it as anet_v1.3.c3d.hdf5, or create a soft link in the data dictionary as ln -s YOURC3DFeature data/anet_v1.3.c3d.hdf5)
  • Download the dense video captioning data from the official website; (Similar to the C3D feature, you are supposed to place the download data in the data folder and rename it as densecap)

  • Download the data for the official evaluation scripts densevid_eval;

    • run the command sh download.sh scripts in the folder PREFIX/WSDEC/third_party/densevid_eval;
  • [Good News]: we write a shell script for you to download the data, just run the following command:

    cd data
    sh download.sh
    

3. Generate the dictionary for the caption model

python scripts/caption_preprocess.py

Training

There are two steps for model training: pretrain a not so bad caption model; and the second step, train the final/baseline model.

Our pretrained captioning model is trained.

python train_script/train_cg_pretrain.py

train our final model

python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME

train baselines

  1. train the baseline model without classification loss.
python train_script/train_baseline_regressor.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME
  1. train the baseline model without regression branch.
python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --regressor_scale 0 --alias MODEL_NAME

About the arguments

All the arguments we use can be found in the corresponding training scripts. You can also use your own argumnets if you like to do so. But please mind, some arguments are discarded(This is our own reimplementation of our paper, the first version codes are too dirty that no one would like to use it.)

Testing

Testing is easier than training. Firstly, in the process of training, our scripts will call the densevid_eval in a subprocess every time after we run the eval function. From these results, you can have a general grasp about the final performance by just have a look at the eval_results.txt scripts. Secondly, after some epochs, you can run the evaluation scripts:

  1. evaluate the full model or no_regression model:
python eval_script/evaluate.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the no_classification model:
python eval_script/evaluate_baseline_regressor.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the pretrained model with random temporal segment:
python eval_script/evaluate_pretrain.py --checkpoint YOUR_PRETRAIN_CAPTION_MODEL.ckp

Other usages

Besides reproduce our work, there are at least two interesting things you can do with our codes.

Train a supervised sentence localization model

To know what is sentence localization, you can have a look at our paper ABLR. Note that our work at a matter of fact provides an unsupervised solution towards sentence localization, we introduce the usage for the supervised model here. We have written the trainer, you can just run the following command and have a cup of coffee:

python train_script/train_sl.py

Train a supervised video event caption generation model

If you have read our paper, you would find that event captioning is the dual task of the aforementioned sentence localization task. To train such a model, just run the following command:

python train_script/train_cg.py

BUGS

You may encounter a cuda internal bug that says Segmentation fault(core dumped) during training if you are using cuda 8.0. If such things happen, try upgrading your cuda to 9.0.

other

We will add more description about how to use our code. Please feel free to contact us if you have any questions or suggestions.

Trained model and results

Links for our trained model

You can download our pretrained model for evaluation or further usage from our onedrive, which includes a pretrained caption generator(cg_pretrain.ckp), a baseline model without classification loss(baseline_noclass.ckp), a baseline model without regression branch(baseline_noregress.ckp), and our final model(final_model.ckp).

Cite the paper and give us star ⭐️

If you find our paper or code useful, please cite our paper using the following bibtex:

@incollection{NIPS2018_7569,
title = {Weakly Supervised Dense Event Captioning in Videos},
author = {Duan, Xuguang and Huang, Wenbing and Gan, Chuang and Wang, Jingdong and Zhu, Wenwu and Huang, Junzhou},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {3062--3072},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7569-weakly-supervised-dense-event-captioning-in-videos.pdf}
}
Owner
Melon(Xuguang Duan)
Lick the screen
Melon(Xuguang Duan)
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

OpenDelta is a toolkit for parameter efficient methods (we dub it as delta tuning), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters

THUNLP 386 Dec 26, 2022
Educational 2D SLAM implementation based on ICP and Pose Graph

slam-playground Educational 2D SLAM implementation based on ICP and Pose Graph How to use: Use keyboard arrow keys to navigate robot. Press 'r' to vie

Kirill 19 Dec 17, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
PyTorch implementation of InstaGAN: Instance-aware Image-to-Image Translation

InstaGAN: Instance-aware Image-to-Image Translation Warning: This repo contains a model which has potential ethical concerns. Remark that the task of

Sangwoo Mo 827 Dec 29, 2022
LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

LightningFSL: Few-Shot Learning with Pytorch-Lightning In this repo, a number of pytorch-lightning implementations of FSL algorithms are provided, inc

Xu Luo 76 Dec 11, 2022
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Interscript The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts. Dataset data.json contains the data in an

AI2 8 Dec 01, 2022
Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

AutoViz and Auto_ViML 102 Dec 17, 2022
PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

tao han 91 Nov 10, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Unsupervised Pre-training for Person Re-identification (LUPerson)

LUPerson Unsupervised Pre-training for Person Re-identification (LUPerson). The repository is for our CVPR2021 paper Unsupervised Pre-training for Per

143 Dec 24, 2022
Code for the paper "Graph Attention Tracking". (CVPR2021)

SiamGAT 1. Environment setup This code has been tested on Ubuntu 16.04, Python 3.5, Pytorch 1.2.0, CUDA 9.0. Please install related libraries before r

122 Dec 24, 2022
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

dcf-game-infrastructure All the components necessary to run a game of the OOO DC

Order of the Overflow 46 Sep 13, 2022
Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

Description Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ance

Finlay Maguire 1 Jan 06, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
EdMIPS: Rethinking Differentiable Search for Mixed-Precision Neural Networks

EdMIPS is an efficient algorithm to search the optimal mixed-precision neural network directly without proxy task on ImageNet given computation budgets. It can be applied to many popular network arch

Zhaowei Cai 47 Dec 30, 2022
Differentiable Simulation of Soft Multi-body Systems

Differentiable Simulation of Soft Multi-body Systems Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin [Paper] [Code] Updates The C++ backend s

YilingQiao 26 Dec 23, 2022
Texture mapping with variational auto-encoders

vae-textures This is an experiment with using variational autoencoders (VAEs) to perform mesh parameterization. This was also my first project using J

Alex Nichol 41 May 24, 2022
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 05, 2022