Amazon Multilingual Counterfactual Dataset (AMCD)

Last update: Sep 20, 2022

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

This repository contains a dataset described in the paper:

I Wish I Would Have Loved This One, But I Didn’t – A Multilingual Dataset for Counterfactual Detection in Product Reviews. James O’Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, Danushka Bollegala. EMNLP'21. arxiv version

The dataset contains sentences from Amazon customer reviews (sampled from Amazon product review dataset) annotated for counterfactual detection (CFD) binary classification. Counterfactual statements describe events that did not or cannot take place. Counterfactual statements may be identified as statements of the form – If p was true, then q would be true (i.e. assertions whose antecedent (p) and consequent (q) are known or assumed to be false).

The key features of this dataset are:

The dataset is multilingual and contains sentences in English, German, and Japanese.
The labeling was done by professional linguists and high quality was ensured.
The dataset is supplemented with the annotation guidelines and definitions, which were worked out by professional linguists. We also provide the clue word lists, which are typical for counterfactual sentences and were used for initial data filtering. The clue word lists were also compiled by professional linguists.

Please see paper for the data statistics, detailed description of data collection and annotation.

For the dataset format please see README.txt.

Cite

If you use this dataset in your research, please cite the paper.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

Amazon Multilingual Counterfactual Dataset (AMCD)

Related tags

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

Cite

License Summary

Owner

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

A tool helps build a talk preview image by combining the given background image and talk event description

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

A list of NLP(Natural Language Processing) tutorials

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Library for Russian imprecise rhymes generation

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

A library for Multilingual Unsupervised or Supervised word Embeddings

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Material for GW4SHM workshop, 16/03/2022.

Pytorch-Named-Entity-Recognition-with-BERT

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Plugin repository for Macast

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Ongoing research training transformer language models at scale, including: BERT & GPT-2