Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Last update: Nov 25, 2022

Related tags

Deep Learning SGN

Overview

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv]

Environment

Ubuntu 16.04
CUDA 9.2
cuDNN 7.4.2
Java 8
Python 2.7.12
- PyTorch 1.1.0
- Other python packages specified in requirements.txt

Usage

1. Setup

$ pip install -r requirements.txt

2. Prepare Data

Download the GloVe Embedding from here and locate it at data/Embeddings/GloVe/GloVe_300.json.
Extract features from datasets and locate them at data/ /features/ .hdf5.

e.g. ResNet101 features of the MSVD dataset will be located at data/MSVD/features/ResNet101.hdf5.

I refer to this repo for extracting the ResNet101 features, and this repo for extracting the 3D-ResNext101 features.
Split the features into train, val, and test sets by running following commands.
```
$ python -m split.MSVD
$ python -m split.MSR-VTT
```

You can skip step 2-3 and download below files

MSVD
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]
MSR-VTT
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]

3. Prepare The Code for Evaluation

Clone the evaluation code from the official coco-evaluation repo.

$ git clone https://github.com/tylin/coco-caption.git
$ mv coco-caption/pycocoevalcap .
$ rm -rf coco-caption

4. Extract Negative Videos

$ python extract_negative_videos.py

or you can skip this step as the output files are already uploaded at data/ /metadata/neg_vids_ .json

5. Train

$ python train.py

You can change some hyperparameters by modifying config.py.

Pretrained Models - SGN(R101+RN)

*Disclaimer: The models above do not have the same weight as the models used in the paper (I trained them again because I lost).

6. Evaluate

$ python evaluate.py --ckpt_fpath

License

The source-code in this repository is released under MIT License.

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Related tags

Overview

Semantic Grouping Network for Video Captioning

Environment

Usage

1. Setup

2. Prepare Data

3. Prepare The Code for Evaluation

4. Extract Negative Videos

5. Train

6. Evaluate

License

Owner

Hobin Ryu

Neural Contours: Learning to Draw Lines from 3D Shapes (CVPR2020)

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

A deep learning based semantic search platform that computes similarity scores between provided query and documents

Codecov coverage standard for Python

Source code for "OmniPhotos: Casual 360° VR Photography"

Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

My implementation of Fully Convolutional Neural Networks in Keras

Direct Multi-view Multi-person 3D Human Pose Estimation

Efficient semidefinite bounds for multi-label discrete graphical models.

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

Using Machine Learning to Create High-Res Fine Art

Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

How to Leverage Multimodal EHR Data for Better Medical Predictions?

style mixing for animation face

Spam your friends and famly and when you do your famly will disown you and you will have no friends.

Numerical Methods with Python, Numpy and Matplotlib

This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

Bayesian Meta-Learning Through Variational Gaussian Processes