Automatic Video Captioning Evaluation Metric --- EMScore

Last update: Nov 28, 2022

Related tags

Deep Learning emscore

Overview

Automatic Video Captioning Evaluation Metric --- EMScore

Overview

For an illustration, EMScore can be computed as:

Installation

modify the encode_text() function in CLIP/clip/model.py as follows:

def encode_text(self, text, local=False):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]

    x = x + self.positional_embedding.type(self.dtype)
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.transformer(x)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.ln_final(x).type(self.dtype)

    if local:
        x = x @ self.text_projection
    else:
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
  
    return x

Push your modified CLIP to your GitHub.

Install

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/$Yours_GitHub_name/CLIP

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Usage:

A general demo

python demo.py

VATEX-EVAL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1jAfZZKEgkMEYFF2x1mhYo39nH-TNeGm6?usp=sharing

run code

python VATEX-EVAL-demo.py --storage_path $storage_path --use_n_refs 1 --use_feat_cache --use_idf

ActivityNet-FOIL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1oY9EJiEi_db_1GH-R33JDqfE8txffKR3?usp=sharing

run code

python ActivityNet-FOIL_demo.py --storage_path $storage_path --use_references --use_idf

Others

if you want extract embeddings by yourself:

python extract_video_embeddings.py --videos_path $your_video_path  --save_path $your_storage_path --backbone 'ViT-B/32'

Automatic Video Captioning Evaluation Metric --- EMScore

Related tags

Overview

Overview

Installation

Usage:

A general demo

VATEX-EVAL

ActivityNet-FOIL

Others

Owner

Yaya Shi

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Loopy belief propagation for factor graphs on discrete variables, in JAX!

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Point-NeRF: Point-based Neural Radiance Fields

DANA paper supplementary materials

Vehicle Detection Using Deep Learning and YOLO Algorithm

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

Custom studies about block sparse attention.

Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

TSIT: A Simple and Versatile Framework for Image-to-Image Translation