Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Last update: Jan 23, 2022

Related tags

Deep Learning Video-Captioning

Overview

Video-Captioning

A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video.

Approach

In our framework we use a sequence-to-sequence model to perform video visual relationship predictions where the input is a sequence of video frames and the output is a relation triplet < object1 − relationship − object2 > representing the videos. We extend the sequence-to-sequence modelling approach to an input of sequence of video frames.

Figure: Bidirectional LSTM layer (coloured red) encodes visual feature inputs, and the LSTM layer (coloured green) decodes the features into a sequence of words.

Results

Python Dependencies

Pandas
Keras
Tensorflow
Numpy
albumenations
Pillow

Procedure

Training

For training the model, run the script train.py.

  python train.py

For training on your own dataset: Save your data in a directory (for the format check the data folder). Update the json files.

object1_object2.json: It contains a dictionary for each object, with object labels as keys and ids as values.
relationship.json: It contains a dictionary for each relationship, with relationship labels as keys and ids as values.
training_annotations.json: It contains a dictionary for each video in the training data, with video ids as keys and a list of as values.

While running the script provide your directory path.

  python eval.py --train_data

Testing

For testing the model or making predictions on your own dataset, run the script eval.py.

  python eval.py --test_data

Result will be saved to a csv file 'test_data_predictions.csv'.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Self-supervised Label Augmentation via Input Transformations (ICML 2020)

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

Heterogeneous Temporal Graph Neural Network

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

基于DouZero定制AI实战欢乐斗地主

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

A generalist algorithm for cell and nucleus segmentation.

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

A project that uses optical flow and machine learning to detect aimhacking in video clips.

Neural Radiance Fields Using PyTorch

Near-Duplicate Video Retrieval with Deep Metric Learning

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Deep learned, hardware-accelerated 3D object pose estimation

Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

ZEBRA: Zero Evidence Biometric Recognition Assessment