Automatic Video Captioning Evaluation Metric --- EMScore

Last update: Nov 28, 2022

Related tags

Deep Learning emscore

Overview

Automatic Video Captioning Evaluation Metric --- EMScore

Overview

For an illustration, EMScore can be computed as:

Installation

modify the encode_text() function in CLIP/clip/model.py as follows:

def encode_text(self, text, local=False):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]

    x = x + self.positional_embedding.type(self.dtype)
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.transformer(x)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.ln_final(x).type(self.dtype)

    if local:
        x = x @ self.text_projection
    else:
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
  
    return x

Push your modified CLIP to your GitHub.

Install

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/$Yours_GitHub_name/CLIP

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Usage:

A general demo

python demo.py

VATEX-EVAL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1jAfZZKEgkMEYFF2x1mhYo39nH-TNeGm6?usp=sharing

run code

python VATEX-EVAL-demo.py --storage_path $storage_path --use_n_refs 1 --use_feat_cache --use_idf

ActivityNet-FOIL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1oY9EJiEi_db_1GH-R33JDqfE8txffKR3?usp=sharing

run code

python ActivityNet-FOIL_demo.py --storage_path $storage_path --use_references --use_idf

Others

if you want extract embeddings by yourself:

python extract_video_embeddings.py --videos_path $your_video_path  --save_path $your_storage_path --backbone 'ViT-B/32'

Automatic Video Captioning Evaluation Metric --- EMScore

Related tags

Overview

Overview

Installation

Usage:

A general demo

VATEX-EVAL

ActivityNet-FOIL

Others

Owner

Yaya Shi

ScaleNet: A Shallow Architecture for Scale Estimation

PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

Prevent `CUDA error: out of memory` in just 1 line of code.

Synthetic Humans for Action Recognition, IJCV 2021

Integrated physics-based and ligand-based modeling.

"Neural Turing Machine" in Tensorflow

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

On Evaluation Metrics for Graph Generative Models

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

利用Tensorflow实现基于CNN的中文短文本分类

Simple and ready-to-use tutorials for TensorFlow

SIEM Logstash parsing for more than hundred technologies

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

FluidNet re-written with ATen tensor lib

Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21