Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Last update: Jan 23, 2022

Related tags

Deep Learning Video-Captioning

Overview

Video-Captioning

A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video.

Approach

In our framework we use a sequence-to-sequence model to perform video visual relationship predictions where the input is a sequence of video frames and the output is a relation triplet < object1 − relationship − object2 > representing the videos. We extend the sequence-to-sequence modelling approach to an input of sequence of video frames.

Figure: Bidirectional LSTM layer (coloured red) encodes visual feature inputs, and the LSTM layer (coloured green) decodes the features into a sequence of words.

Results

Python Dependencies

Pandas
Keras
Tensorflow
Numpy
albumenations
Pillow

Procedure

Training

For training the model, run the script train.py.

  python train.py

For training on your own dataset: Save your data in a directory (for the format check the data folder). Update the json files.

object1_object2.json: It contains a dictionary for each object, with object labels as keys and ids as values.
relationship.json: It contains a dictionary for each relationship, with relationship labels as keys and ids as values.
training_annotations.json: It contains a dictionary for each video in the training data, with video ids as keys and a list of as values.

While running the script provide your directory path.

  python eval.py --train_data

Testing

For testing the model or making predictions on your own dataset, run the script eval.py.

  python eval.py --test_data

Result will be saved to a csv file 'test_data_predictions.csv'.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Simple object detection app with streamlit

The backbone CSPDarkNet of YOLOX.

Text-Based Ideal Points

Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Extreme Rotation Estimation using Dense Correlation Volumes

Learn other languages using artificial intelligence with python.

Dynamic wallpaper generator.

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

Python interface for SmartRF Sniffer 2 Firmware

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

PaSST: Efficient Training of Audio Transformers with Patchout

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Pytorch implementation of PCT: Point Cloud Transformer

Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Simple object detection app with streamlit

The backbone CSPDarkNet of YOLOX.

Text-Based Ideal Points

Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Extreme Rotation Estimation using Dense Correlation Volumes

Learn other languages ​​using artificial intelligence with python.

Dynamic wallpaper generator.

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

Python interface for SmartRF Sniffer 2 Firmware

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

PaSST: Efficient Training of Audio Transformers with Patchout

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Pytorch implementation of PCT: Point Cloud Transformer

Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

Learn other languages using artificial intelligence with python.