A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021

Related tags

Deep LearningPGL-SUM
Overview

PGL-SUM: Combining Global and Local Attention with Positional Encoding for Video Summarization

PyTorch Implementation of PGL-SUM

  • From "PGL-SUM: Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021.
  • Written by Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris and Ioannis Patras.
  • This software can be used for training a deep learning architecture which estimates frames' importance after modeling their dependencies with the help of global and local multi-head attention mechanisms that integrate a positional encoding component. Training is performed in a supervised manner based on ground-truth data (human-generated video summaries). After being trained on a collection of videos, the PGL-SUM model is capable of producing representative summaries for unseen videos, according to a user-specified time-budget about the summary duration.

Main dependencies

Developed, checked and verified on an Ubuntu 20.04.3 PC with a NVIDIA TITAN Xp GPU. Main packages required:

Python PyTorch CUDA Version cuDNN Version TensorBoard TensorFlow NumPy H5py
3.8(.8) 1.7.1 11.0 8005 2.4.1 2.3.0 1.20.2 2.10.0

Data

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the data folder. The GoogleNet features of the video frames were extracted by Ke Zhang and Wei-Lun Chao and the h5 files were obtained from Kaiyang Zhou. These files have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth importance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the dataset providers' webpages:

Training

To train the model using one of the aforementioned datasets and for a number of randomly created splits of the dataset (where in each split 80% of the data is used for training and 20% for testing) use the corresponding JSON file that is included in the data/splits directory. This file contains the 5 randomly-generated splits that were utilized in our experiments.

For training the model using a single split, run:

python model/main.py --split_index N --n_epochs E --batch_size B --video_type 'dataset_name'

where, N refers to the index of the used data split, E refers to the number of training epochs, B refers to the batch size, and dataset_name refers to the name of the used dataset.

Alternatively, to train the model for all 5 splits, use the run_summe_splits.sh and/or run_tvsum_splits.sh script and do the following:

chmod +x model/run_summe_splits.sh    # Makes the script executable.
chmod +x model/run_tvsum_splits.sh    # Makes the script executable.
./model/run_summe_splits.sh           # Runs the script. 
./model/run_tvsum_splits.sh           # Runs the script.  

Please note that after each training epoch the algorithm performs an evaluation step, using the trained model to compute the importance scores for the frames of each video of the test set. These scores are then used by the provided evaluation scripts to assess the overall performance of the model (in F-Score).

The progress of the training can be monitored via the TensorBoard platform and by:

  • opening a command line (cmd) and running: tensorboard --logdir=/path/to/log-directory --host=localhost
  • opening a browser and pasting the returned URL from cmd.

Configurations

Setup for the training process:

  • In data_loader.py, specify the path to the h5 file of the used dataset, and the path to the JSON file containing data about the utilized data splits.
  • In configs.py, define the directory where the analysis results will be saved to.

Arguments in configs.py:

Parameter name Description Default Value Options
--mode Mode for the configuration. 'train' 'train', 'test'
--verbose Print or not training messages. 'false' 'true', 'false'
--video_type Used dataset for training the model. 'SumMe' 'SumMe', 'TVSum'
--input_size Size of the input feature vectors. 1024 int > 0
--seed Chosen number for generating reproducible random numbers. 12345 None, int
--fusion Type of the used approach for feature fusion. 'add' None, 'add', 'mult', 'avg', 'max'
--n_segments Number of video segments; equal to the number of local attention mechanisms. 4 None, int ≥ 2
--pos_enc Type of the applied positional encoding. 'absolute' None, 'absolute', 'relative'
--heads Number of heads of the global attention mechanism. 8 int > 0
--n_epochs Number of training epochs. 200 int > 0
--batch_size Size of the training batch, 20 for 'SumMe' and 40 for 'TVSum'. 20 0 < int ≤ len(Dataset)
--clip Gradient norm clipping parameter. 5 float
--lr Value of the adopted learning rate. 5e-5 float
--l2_req Value of the regularization factor. 1e-5 float
--split_index Index of the utilized data split. 0 0 ≤ int ≤ 4
--init_type Weight initialization method. 'xavier' None, 'xavier', 'normal', 'kaiming', 'orthogonal'
--init_gain Scaling factor for the initialization methods. None None, float

Model Selection and Evaluation

The utilized model selection criterion relies on the post-processing of the calculated losses over the training epochs and enables the selection of a well-trained model by indicating the training epoch. To evaluate the trained models of the architecture and automatically select a well-trained model, define the dataset_path in compute_fscores.py and run evaluate_exp.sh. To run this file, specify:

  • base_path/exp$exp_num: the path to the folder where the analysis results are stored,
  • $dataset: the dataset being used, and
  • $eval_method: the used approach for computing the overall F-Score after comparing the generated summary with all the available user summaries (i.e., 'max' for SumMe and 'avg' for TVSum).
sh evaluation/evaluate_exp.sh $exp_num $dataset $eval_method

For further details about the adopted structure of directories in our implementation, please check line #6 and line #11 of evaluate_exp.sh.

Trained models and Inference

We have released the trained models for our main experiments -namely Table III and Table IV- of our ISM 2021 paper. The inference.py script, lets you evaluate the -reported- trained models, for our 5 randomly-created data splits. Firstly, download the trained models, with the following script:

sudo apt-get install unzip wget
wget "https://zenodo.org/record/5635735/files/pretrained_models.zip?download=1" -O pretrained_models.zip
unzip pretrained_models.zip -d inference
rm -f pretrained_models.zip

Then, specify the PATHs for the model, the split_file and the dataset in use. Finally, run the script with the following syntax

python inference/inference.py --table ID --dataset 'dataset_name'

where, ID refers to the id of the reported table, and dataset_name refers to the name of the used dataset.

License

Copyright (c) 2021, Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras / CERTH-ITI. All rights reserved. This code is provided for academic, non-commercial use only. Redistribution and use in source and binary forms, with or without modification, are permitted for academic non-commercial use provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation provided with the distribution.

This software is provided by the authors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the authors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Acknowledgement

This work was supported by the EU Horizon 2020 programme under grant agreement H2020-832921 MIRROR, and by EPSRC under grant No. EP/R026424/1.
Owner
Evlampios Apostolidis
Evlampios Apostolidis
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

FlyEgle 214 Dec 29, 2022
SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

SweiNet SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging. SweiNet takes as in

Felix Jin 3 Mar 31, 2022
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

官方讨论群 QQ群:552703875 微信群:15158106211(先加作者微信,再邀请入群) YoloAll项目简介 YoloAll是一个将当前主流Yolo版本集成到同一个UI界面下的推理预测工具。可以迅速切换不同的yolo版本,并且可以针对图片,视频,摄像头码流进行实时推理,可以很方便,直观

DL-Practise 244 Jan 01, 2023
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

Bo Zheng 42 Dec 09, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
An educational tool to introduce AI planning concepts using mobile manipulator robots.

JEDAI Explains Decision-Making AI Virtual Machine Image The recommended way of using JEDAI is to use pre-configured Virtual Machine image that is avai

Autonomous Agents and Intelligent Robots 13 Nov 15, 2022
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition Paper: MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition accepted fo

64 Dec 18, 2022
Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

SURGE: Sequential Recommendation with Graph Neural Networks This is our TensorFlow implementation for the paper: Sequential Recommendation with Graph

FIB LAB, Tsinghua University 53 Dec 26, 2022
Preprocessed Datasets for our Multimodal NER paper

Unified Multimodal Transformer (UMT) for Multimodal Named Entity Recognition (MNER) Two MNER Datasets and Codes for our ACL'2020 paper: Improving Mult

76 Dec 21, 2022
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 03, 2023
ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

Yanda Meng 14 May 13, 2022
AdvStyle - Official PyTorch Implementation

AdvStyle - Official PyTorch Implementation Paper | Supp Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes. Huiting Ya

Beryl 37 Oct 21, 2022
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

64 Dec 16, 2022
Code for "Universal inference meets random projections: a scalable test for log-concavity"

How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo

Robin Dunn 0 Nov 21, 2021
A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

張致強 14 Dec 02, 2022