Generic Event Boundary Detection: A Benchmark for Event Segmentation

Related tags

Deep LearningGEBD
Overview

Generic Event Boundary Detection: A Benchmark for Event Segmentation

We release our data annotation & baseline codes for detecting generic event boundaries in video.

Links: [Arxiv] [LOVEU Challenge]

Contributors: Mike Zheng Shou, Stan Lei, Deepti Ghadiyaram, Weiyao Wang, Matt Feiszli.

Overview

This repo has the following structure:

./
│   LICENSE
│   README.md
│   INSTRUCTIONS.md
│
├───BdyDet
│   ├───k400
│   │       detect_event_boundary.py
│   │       run_multiprocess_detect_event_boundary.py
│   │
│   └───TAPOS
│           detect_event_boundary.py
│           run_multiprocess_detect_event_boundary.py
│
├───Challenge_eval_Code
│       eval.py
│       README.md
│
├───data
│   ├───export
│   │       prepare_hmdb_release.ipynb
│   │       prepare_k400_release.ipynb
│   │
│   ├───exp_k400
│   │   │   classInd.txt
│   │   │   val_set.csv
│   │   │
│   │   ├───detect_seg
│   │   └───pred_err
│   └───exp_TAPOS
│       │   train_set.csv
│       │   val_set.csv
│       │
│       ├───detect_seg
│       └───pred_err
├───eval
│       eval_GEBD_k400.ipynb
│       eval_GEBD_TAPOS.ipynb
│
├───PA_DPC
│   │   LICENSE
│   │   README.md
│   │
│   ├───asset
│   │       arch.png
│   │
│   ├───backbone
│   │       convrnn.py
│   │       resnet_2d.py
│   │       resnet_2d3d.py
│   │       select_backbone.py
│   │
│   ├───dpc
│   │       dataset_3d_infer_pred_error.py
│   │       main_infer_pred_error.py
│   │       model_3d.py
│   │
│   └───utils
│           augmentation.py
│           utils.py
│
└───PC
    │   PC_test.py
    │   PC_train.py
    │   README.md
    │
    ├───DataAssets
    ├───datasets
    │       augmentation.py
    │       MultiFDataset.py
    │
    ├───modeling
    │       resnetGEBD.py
    │
    ├───run
    │       pc_k400_dist.sh
    │       pc_tapos_dist.sh
    │
    └───utils
            augmentation.py
            checkpoint_saver.py
            augmentation.py
            checkpoint_saver.py
            clip_grad.py
            cuda.py
            getter.py
            helper.py
            log.py
            metric.py
            model_ema.py
            optim_factory.py
            sampler.py
            scheduler.py

Note that we release codes on Github. Annotations are available on GoogleDrive. Run the code by yourself to generate the output files. Refer to INSTRUCTIONS for preparing data and generating submission files.

  • data/:

    • export/ folder stores temporal boundary annotations of our Kinetics-GEBD and HMDB-GEBD datasets; download our raw annotations and put them under this folder.
    • exp_k400/ and exp_TAPOS/ store intermediate experimental data and final results.
  • Challenge_eval_Code/: codes for evaluation in LOVEU Challenge Track 1.

  • BdyDet/: codes for detecting boundary positions based on predictability sequence.

  • eval/: codes for evaluating the performance of boundary detection.

  • PA_DPC/: codes for computing the predictability sequence using various methods.

  • PC/: codes for supervised baseline on GEBD.

data/

  • In data/export/:

    • *_raw_annotation.pkl stores the raw annotations; download raw annotations here.
    • we further filter out videos that receives <3 annotations and conduct pre-processing e.g. merge very close boundaries - we use notebook prepare_*_release.ipynb and the output is stored in *_mr345_min_change_duration0.3.pkl.
  • Some fields in *_raw_annotation.pkl:

    • fps: frames per second.
    • video_duration:video duration in second.
    • f1_consis: a list of consistency scores, each score corresponds to a specific annotator’s score as compared to other annotators.
    • substages_timestamps: a list of annotations from each annotator; each annotator’s annotation is again a list of boundaries. time in second; for 'label', it is of format A: B.
      • If A is “ShotChangeGradualRange”, B could be “Change due to Cut”, “Change from/to slow motion”, “Change from/to fast motion”, or “N/A”.
      • If A is “ShotChangeImmediateTimestamp”, B could be “Change due to Pan”, “Change due to Zoom”, “Change due to Fade/Dissolve/Gradual”, “Multiple”, or “N/A”.
      • If A is “EventChange (Timestamp/Range)”, B could be “Change of Color”, “Change of Actor/Subject”, “Change of Object Being Interacted”, “Change of Action”, “Multiple”, or “N/A”.
  • Some fields in *_mr345_min_change_duration0.3.pkl:

    • substages_myframeidx: the term of substage is following the convention in TAPOS to refer to boundary position -- here we store each boundary’s frame index which starts at 0.
  • In data/exp_k400/ and data/exp_TAPOS/:

    • pred_err/ stores output of PA_DPC/ i.e. the predictability sequence;
    • detect_seg/ stores output of BdyDet/ i.e. detected boundary positions.

Challenge_eval_Code/

  • We use eval.py for evaluation in our competition.

  • Although one can use frame_index and number of total frames to measure the Rel.Dis, as we implemented in eval/, you should represent the detected boundaries with timestamps (in seconds). For example:

    {
    ‘6Tz5xfnFl4c’: [5.9, 9.4], # boundaries detected at 5.9s, 9.4s of this video
    ‘zJki61RMxcg’: [0.6, 1.5, 2.7] # boundaries detected at 0.6s, 1.5s, 2.7s of this video
    ...
    }
  • Refer to this file to generate GT files from raw annotations.

BdyDet/

Change to directory ./BdyDet/k400 or ./BdyDet/TAPOS and run the following command, which will launch multiple processes of detect_event_boundary.py.

(Note to set the number of processes according to your server in detect_event_boundary.py.)

python run_multiprocess_detect_event_boundary.py

PA_DPC/

Our implementation is based on the [DPC] framework. Please refer to their README or website to learn installation and usage. In the below, we only explain how to run our scripts and what are our modifications.

  • Modifications at a glance

    • main_infer_pred_error.py runs in only inference mode to assess predictability over time.
    • dataset_3d_infer_pred_error.py contains loaders for two datasets i.e. videos from Kinetics and truncated instances from TAPOS.
    • model_3d.py adds several classes for feature extraction purposes, e.g. ResNet feature before the pooling layer, to enable computing feature difference directly based on ImageNet pretrained model, in contrast to the predictive model in DPC.
  • How to run?

    change to directory ./PA_DPC/dpc/

    • Kinetics-GEBD:
    python main_infer_pred_error.py --gpu 0 --model resnet50-beforepool --num_seq 2 --pred_step 1 --seq_len 5 --dataset k400 --batch_size 160 --img_dim 224 --mode val --ds 3 --pred_task featdiff_rgb
    • TAPOS:
    python main_infer_pred_error.py --gpu 0 --model resnet50-beforepool --num_seq 2 --pred_step 1 --seq_len 5 --dataset tapos_instances --batch_size 160 --img_dim 224 --mode val --ds 3 --pred_task featdiff_rgb
    • Notes:

      • ds=3 means we downsample the video frames by 3 to reduce computation.

      • seq_len=5 means 5 frames for each step (concept used in DPC).

      • num_seq=2and pred_step=1 so that at a certain time position t, we use 1 step before and 1 step after to assess the predictability at time t; thus the predictability at time t is the feature difference between the average feature of 5 sampled frames before t and the average feature of 5 sampled frames after t.

      • we slide such model over time to obtain predictability at different temporal positions. More details can be found in dataset_3d_infer_pred_error.py. Note that window_lists.pkl stores the index of frames to be sampled for every window - since it takes a long time to compute (should be able to optimize in the future), we store its value and just load this pre-computed value in the future runs on the same dataset during experimental explorations.

PC/

PC is a supervised baseline for GEBD task. In the PC framework, for each frame f in a video, we take T frames preceding f and T frames succeeding f as inputs, and then build a binary classifier to predict if f is boundary or background.

  • Get Started

    • Check PC/datasets/MultiFDataset.py to generate GT files for training. Note that you should prepare k400_mr345_*SPLIT*_min_change_duration0.3.pkl for Kinetics-GEBD and TAPOS_*SPLIT*_anno.pkl (this should be organized as k400_mr345_*SPLIT*_min_change_duration0.3.pkl for convenience) for TAPOS before running our code.
    • You should accordingly change PATH_TO in our codes to your data/frames path as needed.
  • How to run?

    Change directory to ./PC.

    • Train on Kinetics-GEBD:

      CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 PC_train.py \
      --dataset kinetics_multiframes \
      --train-split train \
      --val-split val \
      --num-classes 2 \
      --batch-size 32 \
      --n-sample-classes 2 \
      --n-samples 16 \
      --lr 0.01 \
      --warmup-epochs 0 \
      --epochs 30 \
      --decay-epochs 10 \
      --model multiframes_resnet \
      --pin-memory \
      --balance-batch \
      --sync-bn \
      --amp \
      --native-amp \
      --eval-metric loss \
      --log-interval 50
    • Train on TAPOS:

      CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 PC_train.py \
      --dataset tapos_multiframes \
      --train-split train \
      --val-split val \
      --num-classes 2 \
      --batch-size 32 \
      --n-sample-classes 2 \
      --n-samples 16 \
      --lr 0.01 \
      --warmup-epochs 0 \
      --epochs 30 \
      --decay-epochs 10 \
      --model multiframes_resnet \
      --pin-memory \
      --balance-batch \
      --sync-bn \
      --amp \
      --native-amp \
      --eval-metric loss \
      --log-interval 50 
    • Generate scores sequence on Kinetics-GEBD Validation Set:

      CUDA_VISIBLE_DEVICES=0 python PC_test.py \ 
      --dataset kinetics_multiframes \
      --val-split val \
      --resume path_to/checkpoint
    • Generate scores sequence on TAPOS:

      CUDA_VISIBLE_DEVICES=0 python PC_test.py \ 
      --dataset tapos_multiframes \
      --val-split val \
      --resume path_to/checkpoint
  • Models

Misc

  • Download datasets from Kinetics-400 and TAPOS. (Note that some of the videos can not be downloaded from YouTube for some reason, you can go ahead with those available.)

  • Note that for TAPOS, you need to cut out each action instance by yourself first and then can use our following codes to process each instance's video separately.

  • Extract frames of videos in the dataset.

  • To reproduce PA_DPC, generate your own data/exp_*/val_set.csv, in which path to the folder of video frames and the number of frames in that specific video should be contained.

Q&A

For any questions, welcome to create an issue or email Mike ([email protected]) and Stan ([email protected]). Thank you for helping us improve our data & codes.

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c

Phil Wang 22 Oct 28, 2022
Deep Distributed Control of Port-Hamiltonian Systems

De(e)pendable Distributed Control of Port-Hamiltonian Systems (DeepDisCoPH) This repository is associated to the paper [1] and it contains: The full p

Dependable Control and Decision group - EPFL 3 Aug 17, 2022
Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Attention Gated Networks (Image Classification & Segmentation) Pytorch implementation of attention gates used in U-Net and VGG-16 models. The framewor

Ozan Oktay 1.6k Dec 30, 2022
Neural network pruning for finding a sparse computational model for controlling a biological motor task.

MothPruning Scientific Overview Originally inspired by biological nervous systems, deep neural networks (DNNs) are powerful computational tools for mo

Olivia Thomas 0 Dec 14, 2022
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 01, 2023
GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs GraphLily is the first FPGA overlay for graph processing. GraphLily supports a rich se

Cornell Zhang Research Group 39 Dec 13, 2022
Pansharpening by convolutional neural networks in the full resolution framework

Z-PNN: Zoom Pansharpening Neural Network Pansharpening by convolutional neural networks in the full resolution framework is a deep learning method for

20 Nov 24, 2022
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
A library for optimization on Riemannian manifolds

TensorFlow RiemOpt A library for manifold-constrained optimization in TensorFlow. Installation To install the latest development version from GitHub:

Oleg Smirnov 83 Dec 27, 2022
PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

deep-linear-shapes PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper. If you find this code useful i

Romain Loiseau 27 Sep 24, 2022
Transformers based fully on MLPs

Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and

Fawaz Sammani 35 Dec 30, 2022
Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Coursera-deep-learning-specialization - Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks an

Aman Chadha 1.7k Jan 08, 2023
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Nerdy Rodent 2.3k Jan 04, 2023
Arxiv harvester - Poor man's simple harvester for arXiv resources

Poor man's simple harvester for arXiv resources This modest Python script takes

Patrice Lopez 5 Oct 18, 2022
FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流,不可用于非法用途,包括但不限于:用于游戏外挂等,使用本项目产生的任何后果与本人无关! 简介 本项目基于yolov5,实现了一款FPS类游戏(CF、CSGO等)的自瞄AI,本项目旨在使用现

Fabian 246 Dec 28, 2022
code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

MVSS-Net Code and models for ICCV 2021 paper: Image Manipulation Detection by Multi-View Multi-Scale Supervision Update 22.02.17, Pretrained model for

dong_chengbo 131 Dec 30, 2022
[CVPR'22] COAP: Learning Compositional Occupancy of People

COAP: Compositional Articulated Occupancy of People Paper | Video | Project Page This is the official implementation of the CVPR 2022 paper COAP: Lear

Marko Mihajlovic 111 Dec 11, 2022
Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Fair-SSL Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning? Ethical bias in machine learning models has become

1 Dec 18, 2021
Robust and Accurate Object Detection via Self-Knowledge Distillation

Robust and Accurate Object Detection via Self-Knowledge Distillation paper:https://arxiv.org/abs/2111.07239 Environments Python 3.7 Cuda 10.1 Prepare

Weipeng Xu 6 Jul 01, 2022