The SVO-Probes Dataset for Verb Understanding

This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object understanding in image--language models. This benchmark provides two positive and negative images for a given sentence. The negative image differs from the positive one with respect to either subject, verb, or object. Given a sentence, we test if a model can correctly classify both positive and negative images.

For a detailed description of our benchmark, please see the paper Probing Image–Language Transformers for Verb Understanding. Please cite this paper if you use the SVO-Probes benchmark in your work.

Files

svo_probes.csv: our raw data. Each row in the dataset consists of two <sentence,positive-image> and <sentence,negative-image> pairs. Each image is identified by a url and a unique id: pos_image_id (pos_url) or neg_image_id (neg_url) to mark the positive and negative images, respectively. Each image is also associated with subject-verb-object triplets (pos_triplet or neg_triplet) that can be seen in the image. The subj_neg, verb_neg, obj_neg columns specify the type of the negative: for example, subj_neg is True if the negative example is a subject negative.
image_urls.txt: a list of image urls used in our benchmark.
A Colab to analyze pre-trained models on SVO-Probes.

Disclaimer

This is not an official Google product. The SVO-Probes benchmark is created solely for research purposes and is not intended to be used in products. The images in our benchmark are retrieved from the Google Image Search; we expect our images to reflect distributional properties and biases similar to those returned by the Google Image Search API. Furthermore, our dataset is designed to have a similar vocabulary to the Conceptual Captions dataset so we expect our <Subject, Verb, Object> triplets to reflect biases in the Conceptual Captions.

License

The data is made available under the terms of the Creative Commons Attribution 4.0 International Public License (CC BY 4.0). You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode")

If you have concerns or comments about the benchmark, please contact [email protected] and [email protected].

The SVO-Probes Dataset for Verb Understanding

Related tags

Overview

The SVO-Probes Dataset for Verb Understanding

Files

Disclaimer

License

Owner

DeepMind

Unsupervised text tokenizer focused on computational efficiency

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

jiant is an NLP toolkit

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Pretrain CPM - 大规模预训练语言模型的预训练代码

Pytorch NLP library based on FastAI

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

A text augmentation tool for named entity recognition.

Flaxformer: transformer architectures in JAX/Flax

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

【原神】自动演奏风物之诗琴的程序

Linear programming solver for paper-reviewer matching and mind-matching

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

ChatBotProyect - This is an unfinished project about a simple chatbot.

Active learning for text classification in Python