Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Last update: Oct 18, 2022

Related tags

Deep Learning MIGCN

Overview

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos

Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Model Pipeline

Usage

Environment Settings

We use the PyTorch framework.

Python version: 3.7.0
PyTorch version: 1.4.0

Get Code

Clone the repository:

git clone https://github.com/zmzhang2000/MIGCN.git
cd MIGCN

Data Preparation

Charades-STA

Download the preprocessed annotations and features of Charades-STA with I3D features.
Save them in data/charades.

ActivityNet

Download the preprocessed annotations of ActivityNet.
Download the C3D features of ActivityNet.
Process the C3D feature according to process_activitynet_c3d() in data/preprocess/preprocess.py.
Save them in data/activitynet.

Pre-trained Models

Download the checkpoints of Charades-STA and ActivityNet.
Save them in checkpoints

Data Generation

We provide the generation procedure of all MIGCN data.

The raw data is listed in data/raw_data/download.sh.
The preprocess code is in data/preprocess.

Training

Train MIGCN on Charades-STA with I3D feature:

python main.py --dataset charades --feature i3d

Train MIGCN on ActivityNet with C3D feature:

python main.py --dataset activitynet --feature c3d

Testing

Test MIGCN on Charades-STA with I3D feature:

python main.py --dataset charades --feature i3d --test --model_load_path checkpoints/$MODEL_CHECKPOINT

Test MIGCN on ActivityNet with C3D feature:

python main.py --dataset activitynet --feature c3d --test --model_load_path checkpoints/$MODEL_CHECKPOINT

Other Hyper-parameters

List other hyper-parameters by:

python main.py -h

Reference

Please cite the following paper if MIGCN is helpful for your research

@ARTICLE{9547801,
  author={Zhang, Zongmeng and Han, Xianjing and Song, Xuemeng and Yan, Yan and Nie, Liqiang},
  journal={IEEE Transactions on Image Processing}, 
  title={Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos}, 
  year={2021},
  volume={30},
  number={},
  pages={8265-8277},
  doi={10.1109/TIP.2021.3113791}}

Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Related tags

Overview

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos

Model Pipeline

Usage

Environment Settings

Get Code

Data Preparation

Charades-STA

ActivityNet

Pre-trained Models

Data Generation

Training

Testing

Other Hyper-parameters

Reference

Owner

Zongmeng Zhang

GoodNews Everyone! Context driven entity aware captioning for news images

Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

Deep ViT Features as Dense Visual Descriptors

deep_image_prior_extension

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

IMBENS: class-imbalanced ensemble learning in Python.

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Incomplete easy-to-use math solver and PDF generator.

Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Official code repository for the EMNLP 2021 paper

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

A python module for scientific analysis of 3D objects based on VTK and Numpy

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU