PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Related tags

Deep LearningCI-ToD
Overview

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

License: MIT

This repository contains the PyTorch implementation and the data of the paper: Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System. Libo Qin, Tianbao Xie, Shijue Huang, Qiguang Chen, Xiao Xu, Wanxiang Che. EMNLP2021.[PDF] .

This code has been written using PyTorch >= 1.1. If you use any source codes or the datasets included in this toolkit in your work, please cite the following paper. The bibtex are listed below:

@article{qin2021CIToD,
  title={Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System},
  author={Qin, Libo and Xie, Tianbao and Huang, Shijue and Chen, Qiguang and Xu, Xiao and Che, Wanxiang},
  journal={arXiv preprint arXiv:2109.11292},
  year={2021}
}

Abstract

Consistency Identification has obtained remarkable success on open-domain dialogue, which can be used for preventing inconsistent response generation. However, in contrast to the rapid development in open-domain dialogue, few efforts have been made to the task-oriented dialogue direction. In this paper, we argue that consistency problem is more urgent in task-oriented domain. To facilitate the research, we introduce CI-ToD, a novel dataset for Consistency Identification in Task-oriented Dialog system. In addition, we not only annotate the single label to enable the model to judge whether the system response is contradictory, but also provide more finegrained labels (i.e., Dialogue History Inconsistency(HI), User Query Inconsistency(QI) and Knowledge Base Inconsistency(KBI), which are as shown in the figure below) to encourage model to know what inconsistent sources lead to it. Empirical results show that state-of-the-art methods only achieve performance of 51.3%, which is far behind the human performance of 93.2%, indicating that there is ample room for improving consistency identification ability. Finally, we conduct exhaustive experiments and qualitative analysis to comprehend key challenges and provide guidance for future directions.

Dataset

We construct the CI-ToD dataset based on the KVRET dataset. We release our dataset together with the code, you can find it under data.

The basic format of the dataset is as follows, including multiple rounds of dialogue, knowledge base and related inconsistency annotations (KBI, QI, HI):

[
    {
        "id": 74,
        "dialogue": [
            {
                "turn": "driver",
                "utterance": "i need to find out the date and time for my swimming_activity"
            },
            {
                "turn": "assistant",
                "utterance": "i have two which one i have one for the_14th at 6pm and one for the_12th at 7pm"
            }
        ],
        "scenario": {
            "kb": {
                "items": [
                    {
                        "date": "the_11th",
                        "time": "9am",
                        "event": "tennis_activity",
                        "agenda": "-",
                        "room": "-",
                        "party": "father"
                    },
                    {
                        "date": "the_18th",
                        "time": "2pm",
                        "event": "football_activity",
                        "agenda": "-",
                        "room": "-",
                        "party": "martha"
                    },
                    .......
                ]
            },
            "qi": "0",
            "hi": "0",
            "kbi": "0"
        },
        "HIPosition": []
    }

KBRetriever_DC

Dataset QI HI KBI SUM
calendar_train.json 174 56 177 595
calendar_dev.json 28 9 24 74
calendar_test.json 23 8 21 74
navigate_train.json 453 386 591 1110
navigate_dev.json 55 41 69 139
navigate_test.json 48 44 71 138
weather_new_train.json 631 132 551 848
weather_new_dev.json 81 14 66 106
weather_new_test.json 72 12 69 106

Model

Here is the model structure of non pre-trained model (a) and pre-trained model (b and c).

Preparation

we provide some pre-trained baselines on our proposed CI-TOD dataset, the packages we used are listed follow:

-- scikit-learn==0.23.2
-- numpy=1.19.1
-- pytorch=1.1.0
-- fitlog==0.9.13
-- tqdm=4.49.0
-- sklearn==0.0
-- transformers==3.2.0

We highly suggest you using Anaconda to manage your python environment. If so, you can run the following command directly on the terminal to create the environment:

conda env create -f py3.6pytorch1.1_.yaml

How to run it

The script train.py acts as a main function to the project, you can run the experiments by the following commands:

python -u train.py --cfg KBRetriver_DC/KBRetriver_DC_BERT.cfg

The parameters we use are configured in the configure. If you need to adjust them, you can modify them in the relevant files or append parameters to the command.

Finally, you can check the results in logs folder.Also, you can run fitlog command to visualize the results:

fitlog log logs/

Baseline Experiment Result

All experiments were performed in TITAN_XP except for BART, which was performed on Tesla V100 PCIE 32 GB. These may not be the best results. Therefore, the parameters can be adjusted to obtain better results.

KBRetriever_DC

Baseline category Baseline method QI F1 HI F1 KBI F1 Overall Acc
Non Pre-trained Model ESIM (Chen et al., 2017) 0.512 0.164 0.543 0.432
Infersent (Romanov and Shivade, 2018) 0.557 0.031 0.336 0.356
RE2 (Yang et al., 2019) 0.655 0.244 0.739 0.481
Pre-trained Model BERT (Devlin et al., 2019) 0.691 0.555 0.740 0.500
RoBERTa (Liu et al., 2019) 0.715 0.472 0.715 0.500
XLNet (Yang et al., 2020) 0.725 0.487 0.736 0.509
Longformer (Beltagy et al., 2020) 0.717 0.500 0.710 0.497
BART (Lewis et al., 2020) 0.744 0.510 0.761 0.513
Human Human Performance 0.962 0.805 0.920 0.932

Leaderboard

If you submit papers with these datasets, please consider sending a pull request to merge your results onto the leaderboard. By submitting, you acknowledge that your results are obtained purely by training on the training datasets and tuned on the dev datasets (e.g. you only evaluted on the test set once).

KBRetriever_DC

Baseline method QI F1 HI F1 KBI F1 Overall Acc
ESIM (Chen et al., 2017) 0.512 0.164 0.543 0.432
Infersent (Romanov and Shivade, 2018) 0.557 0.031 0.336 0.356
RE2 (Yang et al., 2019) 0.655 0.244 0.739 0.481
BERT (Devlin et al., 2019) 0.691 0.555 0.740 0.500
RoBERTa (Liu et al., 2019) 0.715 0.472 0.715 0.500
XLNet (Yang et al., 2020) 0.725 0.487 0.736 0.509
Longformer (Beltagy et al., 2020) 0.717 0.500 0.710 0.497
BART (Lewis et al., 2020) 0.744 0.510 0.761 0.513
Human Performance 0.962 0.805 0.920 0.932

Acknowledgement

Thanks for patient annotation from all taggers Lehan Wang, Ran Duan, Fuxuan Wei, Yudi Zhang, Weiyun Wang!

Thanks for supports and guidance from our adviser Wanxiang Che!

Contact us

  • Just feel free to open issues or send us email(me, Tianbao) if you have any problems or find some mistakes in this dataset.
Owner
Libo Qin
Ph.D. Candidate in Harbin Institute of Technology @HIT-SCIR. Homepage: http://ir.hit.edu.cn/~lbqin/
Libo Qin
Code for the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness"

DU-VAE This is the pytorch implementation of the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness" Acknowledgement

Dazhong Shen 4 Oct 19, 2022
Ros2-voiceroid2 - ROS2 wrapper package of VOICEROID2

ros2_voiceroid2 ROS2 wrapper package of VOICEROID2 Windows Only Installation Ins

Nkyoku 1 Jan 23, 2022
A collection of implementations of deep domain adaptation algorithms

Deep Transfer Learning on PyTorch This is a PyTorch library for deep transfer learning. We divide the code into two aspects: Single-source Unsupervise

Yongchun Zhu 647 Jan 03, 2023
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 07, 2023
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

Match SafeGraph POI data with Cultural Resource Places in Washington DC Match SafeGraph POIs with Data collected through a cultural resource survey in

Changjie Chen 1 Jan 05, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Dec 30, 2022
[NeurIPS '21] Adversarial Attacks on Graph Classification via Bayesian Optimisation (GRABNEL)

Adversarial Attacks on Graph Classification via Bayesian Optimisation @ NeurIPS 2021 This repository contains the official implementation of GRABNEL,

Xingchen Wan 12 Dec 23, 2022
:fire: 2D and 3D Face alignment library build using pytorch

Face Recognition Detect facial landmarks from Python using the world's most accurate face alignment network, capable of detecting points in both 2D an

Adrian Bulat 6k Dec 31, 2022
kullanışlı ve işinizi kolaylaştıracak bir araç

Hey merhaba! işte çok sorulan sorularının cevabı ve sorunlarının çözümü; Soru= İçinde var denilen birçok şeyi göremiyorum bunun sebebi nedir? Cevap= B

Sexettin 16 Dec 17, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
Few-shot NLP benchmark for unified, rigorous eval

FLEX FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables: First-class NLP support Support for meta-training

AI2 85 Dec 03, 2022
NeoPlay is the project dedicated to ESport events.

NeoPlay is the project dedicated to ESport events. On this platform users can participate in tournaments with prize pools as well as create their own tournaments.

3 Dec 18, 2021
This repository contains the code for the paper Neural RGB-D Surface Reconstruction

Neural RGB-D Surface Reconstruction Paper | Project Page | Video Neural RGB-D Surface Reconstruction Dejan Azinović, Ricardo Martin-Brualla, Dan B Gol

Dejan 406 Jan 04, 2023
Categorizing comments on YouTube into different categories.

Youtube Comments Categorization This repo is for categorizing comments on a youtube video into different categories. negative (grievances, complaints,

Rhitik 5 Nov 26, 2022
code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

On Robust Prefix-Tuning for Text Classification Prefix-tuning has drawed much attention as it is a parameter-efficient and modular alternative to adap

Zonghan Yang 12 Nov 30, 2022
Tensorflow 2.x implementation of Vision-Transformer model

Vision Transformer Unofficial Tensorflow 2.x implementation of the Transformer based Image Classification model proposed by the paper AN IMAGE IS WORT

Soumik Rakshit 16 Jul 20, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [arxiv] This is the official repository for CDTrans: Cross-domain Transformer for

238 Dec 22, 2022
Garbage classification using structure data.

垃圾分类模型使用说明 1.包含以下数据文件 文件 描述 data/MaterialMapping.csv 物体以及其归类的信息 data/TestRecords 光谱原始测试数据 CSV 文件 data/TestRecordDesc.zip CSV 文件描述文件 data/Boundaries.cs

wenqi 1 Dec 10, 2021