The SVO-Probes Dataset for Verb Understanding

This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object understanding in image--language models. This benchmark provides two positive and negative images for a given sentence. The negative image differs from the positive one with respect to either subject, verb, or object. Given a sentence, we test if a model can correctly classify both positive and negative images.

For a detailed description of our benchmark, please see the paper Probing Image–Language Transformers for Verb Understanding. Please cite this paper if you use the SVO-Probes benchmark in your work.

Files

svo_probes.csv: our raw data. Each row in the dataset consists of two <sentence,positive-image> and <sentence,negative-image> pairs. Each image is identified by a url and a unique id: pos_image_id (pos_url) or neg_image_id (neg_url) to mark the positive and negative images, respectively. Each image is also associated with subject-verb-object triplets (pos_triplet or neg_triplet) that can be seen in the image. The subj_neg, verb_neg, obj_neg columns specify the type of the negative: for example, subj_neg is True if the negative example is a subject negative.
image_urls.txt: a list of image urls used in our benchmark.
A Colab to analyze pre-trained models on SVO-Probes.

Disclaimer

This is not an official Google product. The SVO-Probes benchmark is created solely for research purposes and is not intended to be used in products. The images in our benchmark are retrieved from the Google Image Search; we expect our images to reflect distributional properties and biases similar to those returned by the Google Image Search API. Furthermore, our dataset is designed to have a similar vocabulary to the Conceptual Captions dataset so we expect our <Subject, Verb, Object> triplets to reflect biases in the Conceptual Captions.

License

The data is made available under the terms of the Creative Commons Attribution 4.0 International Public License (CC BY 4.0). You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode")

If you have concerns or comments about the benchmark, please contact [email protected] and [email protected].

The SVO-Probes Dataset for Verb Understanding

Related tags

Overview

The SVO-Probes Dataset for Verb Understanding

Files

Disclaimer

License

Owner

DeepMind

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

Course project of [email protected]

基于Transformer的单模型、多尺度的VAE模型

A minimal Conformer ASR implementation adapted from ESPnet.

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Fidibo.com comments Sentiment Analyser

Weakly-supervised Text Classification Based on Keyword Graph

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

LUKE -- Language Understanding with Knowledge-based Embeddings

Korea Spell Checker

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Yodatranslator is a simple translator English to Yoda-language

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

FireFlyer Record file format, writer and reader for DL training samples.

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Generate a cool README/About me page for your Github Profile