💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Last update: Nov 07, 2022

Related tags

Overview

VALSE 💃

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena. https://arxiv.org/abs/2112.07566

Data Instructions

Please find the data in the data folder. The dataset is in json format and contains the following relevant fields:

A reference to the image in the original dataset: dataset and image_file.
The valid sentence, the caption for VALSE: caption.
The altered caption, the foil.
The annotator's votes (3 annotators per sample): mturk.
- The subentry caption counts the number of annotators who chose the caption, but/and not the foil, to be the one describing the image.
- The subentry foil counts how many of the three annotators chose the foil to be (also) describing the image.
- For more information, see subsec. 4.4 and App. E of the paper.

‼️ Please be aware that the jsons are containing both valid (meaning: validated by annotators) and non-validated samples. In order to work only with the valid set, please consider filtering them:

We consider a valid foil to mean: at least two out of three annotators identified the caption, but not the foil, as the text which accurately describes the image.

This means that the valid samples of the dataset are the ones where sample["mturk"]["caption"] >= 2.

Example instance:

{
    "actions_test_0": {
        "dataset": "SWiG",
        "original_split": "test",                 # the split of the original dataset in which the sample belonged to
        "dataset_idx": "exercising_255.jpg",      # the sample id in the original dataset
        "linguistic_phenomena": "actions",        # the linguistic phenomenon targeted
        "image_file": "exercising_255.jpg",
        "caption": "A man exercises his torso.",
        "classes": "man",                         # the word of the caption that was replaced
        "classes_foil": "torso",                  # the foil word / phrase
        "mturk": {
            "foil": 0,
            "caption": 3,
            "other": 0
        },
        "foil": "A torso exercises for a man."
    }, ...
}

Images

For the images, please follow the downloading instructions of the respective original dataset. The provenance of the original images is mentioned in the json files in the field dataset.

Reference

Please cite our 💃 VALSE paper if you are using this dataset.

@misc{parcalabescu2021valse,
      title={VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena}, 
      author={Letitia Parcalabescu and Michele Cafagna and Lilitta Muradjan and Anette Frank and Iacer Calixto and Albert Gatt},
      year={2021},
      eprint={2112.07566},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Related tags

Overview

VALSE 💃

Data Instructions

Images

Reference

Owner

Heidelberg-NLP

Post-training Quantization for Neural Networks with Provable Guarantees

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Deep Ensemble Learning with Jet-Like architecture

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Dense Prediction Transformers

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

sktime companion package for deep learning based on TensorFlow

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

Wenet STT Python

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery