💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Overview

Perspective-taking and Pragmatics for Generating
Empathetic Responses Focused on Emotion Causes

figure

Official PyTorch implementation and EmoCause evaluation set of our EMNLP 2021 paper 💛
Hyunwoo Kim, Byeongchang Kim, and Gunhee Kim. Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes. EMNLP, 2021 [Paper coming soon!]

  • TL;DR: In order to express deeper empathy in dialogues, we argue that responses should focus on the cause of emotions. Inspired by perspective-taking of humans, we propose a generative emotion estimator (GEE) which can recognize emotion cause words solely based on sentence-level emotion labels without word-level annotations (i.e., weak-supervision). To evaluate our approach, we annotate emotion cause words and release the EmoCause evaluation set. We also propose a pragmatics-based method for generating responses focused on targeted words from the context.

Reference

If you use the materials in this repository as part of any published research, we ask you to cite the following paper:

@inproceedings{Kim:2021:empathy,
  title={Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes},
  author={Kim, Hyunwoo and Kim, Byeongchang and Kim, Gunhee},
  booktitle={EMNLP},
  year=2021
}

Implementation

System Requirements

  • Python 3.7.9
  • Pytorch 1.6.0
  • CUDA 10.2 supported GPU with at least 24GB memory
  • See environment.yml for details

Environment setup

Our code is built on the ParlAI framework. We recommend you create a conda environment as follows

conda env create -f environment.yml

and activate it with

conda activate focused-empathy
python -m spacey download en

EmoCause evaluation set for weakly-supervised emotion cause recognition

EmoCause is a dataset of annotated emotion cause words in emotional situations from the EmpatheticDialogues valid and test set. The goal is to recognize emotion cause words in sentences by training only on sentence-level emotion labels without word-level labels (i.e., weakly-supervised emotion cause recognition). EmoCause is based on the fact that humans do not recognize the cause of emotions with supervised learning on word-level cause labels. Thus, we do not provide a training set.

figure

You can download the EmoCause eval set [here].
Note, the dataset will be downloaded automatically when you run the experiment command below.

Data statistics and structure

#Emotion Label type #Label/Utterance #Utterance
EmoCause 32 Word 2.3 4.6K
{
  "original_situation": the original situations in the EmpatheticDialogues,
  "tokenized_situation": tokenized situation utterances using spacy,
  "emotion": emotion labels,
  "conv_id": id for each corresponding conversation in EmpatheticDialogues,
  "annotation": list of tuples: (emotion cause word, index),
  "labels": list of strings containing the emotion cause words
}

Running Experiments

All corresponding models will be downloaded automatically when running the following commands.
We also provide manual download links: [GEE] [Finetuned Blender]

Weakly-supervised emotion cause word recognition with GEE on EmoCause

You can evaluate our proposed Generative Emotion Estimator (GEE) on the EmoCause eval set.

python eval_emocause.py --model agents.gee_agent:GeeCauseInferenceAgent --fp16 False

Focused empathetic response generation with finetuned Blender on EmpatheticDialogues

You can evaluate our approach for generating focused empathetic responses on top of a finetuned Blender (Not familiar with Blender? See here!).

python eval_empatheticdialogues.py --model agents.empathetic_gee_blender:EmpatheticBlenderAgent --model_file data/models/finetuned_blender90m/model --fp16 False --empathy-score False

Adding the --alpha 0 flag will run the Blender without pragmatics. You can also try the random distractor (Plain S1) by adding --distractor-type random.

💡 To measure the Interpretation and Exploration scores also, set the --empathy-score to True. It will automatically download the RoBERTa models finetuned on EmpatheticDialogues. For more details on empathy scores, visit the original repo.

Acknowledgements

We thank the anonymous reviewers for their helpful comments on this work.

This research was supported by Samsung Research Funding Center of Samsung Electronics under project number SRFCIT210101. The compute resource and human study are supported by Brain Research Program by National Research Foundation of Korea (NRF) (2017M3C7A1047860).

Have any question?

Please contact Hyunwoo Kim at hyunw.kim at vl dot snu dot ac dot kr.

License

This repository is MIT licensed. See the LICENSE file for details.

Owner
Hyunwoo Kim
PhD student at Seoul National University CSE
Hyunwoo Kim
Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

Subhadeep Mandal 1 Feb 01, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features 🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

Vega 25.6k Dec 31, 2022
The training code for the 4th place model at MDX 2021 leaderboard A.

The training code for the 4th place model at MDX 2021 leaderboard A.

Chin-Yun Yu 32 Dec 18, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
CredData is a set of files including credentials in open source projects

CredData is a set of files including credentials in open source projects. CredData includes suspicious lines with manual review results and more information such as credential types for each suspicio

Samsung 19 Sep 07, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
This is the source code of RPG (Reward-Randomized Policy Gradient)

RPG (Reward-Randomized Policy Gradient) Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (

40 Nov 25, 2022
Unsupervised Language Modeling at scale for robust sentiment classification

** DEPRECATED ** This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

NVIDIA Corporation 1k Nov 17, 2022
Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

A brief explanation This script provides a quick way to setup a Time-of-day (Tod

2 Feb 03, 2022
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
Get list of common stop words in various languages in Python

Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words

Alireza Savand 142 Dec 21, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

Christian Eilers 21 Dec 11, 2022
Poetry PEP 517 Build Backend & Core Utilities

Poetry Core A PEP 517 build backend implementation developed for Poetry. This project is intended to be a light weight, fully compliant, self-containe

Poetry 293 Jan 02, 2023
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022