PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Related tags

Deep LearningCI-ToD
Overview

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

License: MIT

This repository contains the PyTorch implementation and the data of the paper: Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System. Libo Qin, Tianbao Xie, Shijue Huang, Qiguang Chen, Xiao Xu, Wanxiang Che. EMNLP2021.[PDF] .

This code has been written using PyTorch >= 1.1. If you use any source codes or the datasets included in this toolkit in your work, please cite the following paper. The bibtex are listed below:

@article{qin2021CIToD,
  title={Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System},
  author={Qin, Libo and Xie, Tianbao and Huang, Shijue and Chen, Qiguang and Xu, Xiao and Che, Wanxiang},
  journal={arXiv preprint arXiv:2109.11292},
  year={2021}
}

Abstract

Consistency Identification has obtained remarkable success on open-domain dialogue, which can be used for preventing inconsistent response generation. However, in contrast to the rapid development in open-domain dialogue, few efforts have been made to the task-oriented dialogue direction. In this paper, we argue that consistency problem is more urgent in task-oriented domain. To facilitate the research, we introduce CI-ToD, a novel dataset for Consistency Identification in Task-oriented Dialog system. In addition, we not only annotate the single label to enable the model to judge whether the system response is contradictory, but also provide more finegrained labels (i.e., Dialogue History Inconsistency(HI), User Query Inconsistency(QI) and Knowledge Base Inconsistency(KBI), which are as shown in the figure below) to encourage model to know what inconsistent sources lead to it. Empirical results show that state-of-the-art methods only achieve performance of 51.3%, which is far behind the human performance of 93.2%, indicating that there is ample room for improving consistency identification ability. Finally, we conduct exhaustive experiments and qualitative analysis to comprehend key challenges and provide guidance for future directions.

Dataset

We construct the CI-ToD dataset based on the KVRET dataset. We release our dataset together with the code, you can find it under data.

The basic format of the dataset is as follows, including multiple rounds of dialogue, knowledge base and related inconsistency annotations (KBI, QI, HI):

[
    {
        "id": 74,
        "dialogue": [
            {
                "turn": "driver",
                "utterance": "i need to find out the date and time for my swimming_activity"
            },
            {
                "turn": "assistant",
                "utterance": "i have two which one i have one for the_14th at 6pm and one for the_12th at 7pm"
            }
        ],
        "scenario": {
            "kb": {
                "items": [
                    {
                        "date": "the_11th",
                        "time": "9am",
                        "event": "tennis_activity",
                        "agenda": "-",
                        "room": "-",
                        "party": "father"
                    },
                    {
                        "date": "the_18th",
                        "time": "2pm",
                        "event": "football_activity",
                        "agenda": "-",
                        "room": "-",
                        "party": "martha"
                    },
                    .......
                ]
            },
            "qi": "0",
            "hi": "0",
            "kbi": "0"
        },
        "HIPosition": []
    }

KBRetriever_DC

Dataset QI HI KBI SUM
calendar_train.json 174 56 177 595
calendar_dev.json 28 9 24 74
calendar_test.json 23 8 21 74
navigate_train.json 453 386 591 1110
navigate_dev.json 55 41 69 139
navigate_test.json 48 44 71 138
weather_new_train.json 631 132 551 848
weather_new_dev.json 81 14 66 106
weather_new_test.json 72 12 69 106

Model

Here is the model structure of non pre-trained model (a) and pre-trained model (b and c).

Preparation

we provide some pre-trained baselines on our proposed CI-TOD dataset, the packages we used are listed follow:

-- scikit-learn==0.23.2
-- numpy=1.19.1
-- pytorch=1.1.0
-- fitlog==0.9.13
-- tqdm=4.49.0
-- sklearn==0.0
-- transformers==3.2.0

We highly suggest you using Anaconda to manage your python environment. If so, you can run the following command directly on the terminal to create the environment:

conda env create -f py3.6pytorch1.1_.yaml

How to run it

The script train.py acts as a main function to the project, you can run the experiments by the following commands:

python -u train.py --cfg KBRetriver_DC/KBRetriver_DC_BERT.cfg

The parameters we use are configured in the configure. If you need to adjust them, you can modify them in the relevant files or append parameters to the command.

Finally, you can check the results in logs folder.Also, you can run fitlog command to visualize the results:

fitlog log logs/

Baseline Experiment Result

All experiments were performed in TITAN_XP except for BART, which was performed on Tesla V100 PCIE 32 GB. These may not be the best results. Therefore, the parameters can be adjusted to obtain better results.

KBRetriever_DC

Baseline category Baseline method QI F1 HI F1 KBI F1 Overall Acc
Non Pre-trained Model ESIM (Chen et al., 2017) 0.512 0.164 0.543 0.432
Infersent (Romanov and Shivade, 2018) 0.557 0.031 0.336 0.356
RE2 (Yang et al., 2019) 0.655 0.244 0.739 0.481
Pre-trained Model BERT (Devlin et al., 2019) 0.691 0.555 0.740 0.500
RoBERTa (Liu et al., 2019) 0.715 0.472 0.715 0.500
XLNet (Yang et al., 2020) 0.725 0.487 0.736 0.509
Longformer (Beltagy et al., 2020) 0.717 0.500 0.710 0.497
BART (Lewis et al., 2020) 0.744 0.510 0.761 0.513
Human Human Performance 0.962 0.805 0.920 0.932

Leaderboard

If you submit papers with these datasets, please consider sending a pull request to merge your results onto the leaderboard. By submitting, you acknowledge that your results are obtained purely by training on the training datasets and tuned on the dev datasets (e.g. you only evaluted on the test set once).

KBRetriever_DC

Baseline method QI F1 HI F1 KBI F1 Overall Acc
ESIM (Chen et al., 2017) 0.512 0.164 0.543 0.432
Infersent (Romanov and Shivade, 2018) 0.557 0.031 0.336 0.356
RE2 (Yang et al., 2019) 0.655 0.244 0.739 0.481
BERT (Devlin et al., 2019) 0.691 0.555 0.740 0.500
RoBERTa (Liu et al., 2019) 0.715 0.472 0.715 0.500
XLNet (Yang et al., 2020) 0.725 0.487 0.736 0.509
Longformer (Beltagy et al., 2020) 0.717 0.500 0.710 0.497
BART (Lewis et al., 2020) 0.744 0.510 0.761 0.513
Human Performance 0.962 0.805 0.920 0.932

Acknowledgement

Thanks for patient annotation from all taggers Lehan Wang, Ran Duan, Fuxuan Wei, Yudi Zhang, Weiyun Wang!

Thanks for supports and guidance from our adviser Wanxiang Che!

Contact us

  • Just feel free to open issues or send us email(me, Tianbao) if you have any problems or find some mistakes in this dataset.
Owner
Libo Qin
Ph.D. Candidate in Harbin Institute of Technology @HIT-SCIR. Homepage: http://ir.hit.edu.cn/~lbqin/
Libo Qin
GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Model This repository is the official PyTorch implementation of GraphRNN, a graph gene

Jiaxuan 568 Dec 29, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 03, 2023
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
patchmatch和patchmatchstereo算法的python实现

patchmatch patchmatch以及patchmatchstereo算法的python版实现 patchmatch参考 github patchmatchstereo参考李迎松博士的c++版代码 由于patchmatchstereo没有做任何优化,并且是python的代码,主要是方便解析算

Sanders Bao 11 Dec 02, 2022
Code for GNMR in ICDE 2021

GNMR Code for GNMR in ICDE 2021 Please unzip data files in Datasets/MultiInt-ML10M first. Run labcode_preSamp.py (with graph sampling) for ECommerce-c

7 Oct 27, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

The value of international students to the United States. Probability of getting a non-immigrant visa. Project timeline: Jan 2021 - April 2021 Project

Zinaida Dvoskina 2 Nov 21, 2021
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

<a href=[email protected]"> 55 Jan 03, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Dec 30, 2022
Predicting the duration of arrival delays for commercial flights.

Flight Delay Prediction Our objective is to predict arrival delays of commercial flights. According to the US Department of Transportation, about 21%

Jordan Silke 1 Jan 11, 2022
Fast and robust clustering of point clouds generated with a Velodyne sensor.

Depth Clustering This is a fast and robust algorithm to segment point clouds taken with Velodyne sensor into objects. It works with all available Velo

Photogrammetry & Robotics Bonn 957 Dec 21, 2022
The Python3 import playground

The Python3 import playground I have been confused about python modules and packages, this text tries to clear the topic up a bit. Sources: https://ch

Michael Moser 5 Feb 22, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Deep Learning pipeline for motor-imagery classification.

BCI-ToolBox 1. Introduction BCI-ToolBox is deep learning pipeline for motor-imagery classification. This repo contains five models: ShallowConvNet, De

DongHee 18 Oct 31, 2022
Rax is a Learning-to-Rank library written in JAX

🦖 Rax: Composable Learning to Rank using JAX Rax is a Learning-to-Rank library written in JAX. Rax provides off-the-shelf implementations of ranking

Google 247 Dec 27, 2022
The official implementation of the Hybrid Self-Attention NEAT algorithm

PUREPLES - Pure Python Library for ES-HyperNEAT About This is a library of evolutionary algorithms with a focus on neuroevolution, implemented in pure

Adrian Westh 91 Dec 12, 2022
PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

LTH-ObjectRecognition The Lottery Ticket Hypothesis for Object Recognition Sharath Girish*, Shishira R Maiya*, Kamal Gupta, Hao Chen, Larry Davis, Abh

16 Feb 06, 2022
The Video-based Accident Detection System built in Python

Accident-detection-system About the Project This Repository contains the Video-based Accident Detection System built in Python. Contributors Yukta Gop

SURYAVANSHI SNEHAL BALKRISHNA 50 Dec 07, 2022
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022