Findings of ACL 2021

Last update: Feb 24, 2022

Overview

Assessing Dialogue Systems with Distribution Distances

We propose to measure the performance of a dialogue system by computing the distributionwise distance between its generated conversations and real-world conversations.

To appear in Findings of ACL 2021.

Note that this is not an officially supported Tencent product.

1. Configuratin

This repository requires the packages:

pytorch
huggingface/transformers.

2. Usage

To evaluate the system-level human correlations of metrics:

python eval_metric.py \
  --data_path ./datasets/convai2_annotation.json \
  --metric fbd \
  --sample_num 10 \
  --model_type roberta-base \
  --batch_size 32

Currently, our repo supports the common metrics used in text generation field, inclduing bleu, meteor, rouge, greedy, average, extrema, bert_score, fbd and prd.

Here are some details of the six corpura compared in the main paper:

File Name	Dataset Name	Num. of Samples	Reference
`personam_annotation.json`	Persona(M)	60	Shikib/usr
`dailyh_annotation.json`	Daily(H)	150	li3cmz/GRADE
`convai2_annotation.json`	Convai2	150	li3cmz/GRADE
`empathetic_annotation.json`	Empathetic	150	li3cmz/GRADE
`dailyz_annotation.json`	Daily(Z)	100	ZHAOTING/dialog-processing
`personaz_annotation.json`	Persona(Z)	150	ZHAOTING/dialog-processing

Citation

If you use this research/codebase/dataset, please cite our paper:

@article{xiang2021assessing,
  title={Assessing Dialogue Systems with Distribution Distances},
  author={Xiang, Jiannan and Liu, Yahui and Cai, Deng and Li, Huayang and Lian, Defu and Liu, Lemao},
  journal={arXiv preprint arXiv:2105.02573},
  year={2021}
}

Other related papers:

[1] FID, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, NIPS 2017
[2] PRD, Assessing Generative Models via Precision and Recall, NIPS 2018
[3] BERTScore, BERTScore: Evaluating Text Generation with BERT, ICLR 2020

Findings of ACL 2021

Related tags

Overview

Assessing Dialogue Systems with Distribution Distances

1. Configuratin

2. Usage

Citation

Owner

Yahui Liu

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

NSFW A chatbot based on GPT2-chitchat

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Long text token classification using LongFormer

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Transformer related optimization, including BERT, GPT

Finetune gpt-2 in google colab

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

🤕 spelling exceptions builder for lazy people

A BERT-based reverse dictionary of Korean proverbs

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

DVC-NLP-Simple-usecase

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Perform sentiment analysis and keyword extraction on Craigslist listings

CoSENT、STS、SentenceBERT

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

Findings of ACL 2021

Related tags

Overview

Assessing Dialogue Systems with Distribution Distances

1. Configuratin

2. Usage

Citation

Owner

Yahui Liu

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

**NSFW** A chatbot based on GPT2-chitchat

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Long text token classification using LongFormer

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Transformer related optimization, including BERT, GPT

Finetune gpt-2 in google colab

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

🤕 spelling exceptions builder for lazy people

A BERT-based reverse dictionary of Korean proverbs

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

DVC-NLP-Simple-usecase

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Perform sentiment analysis and keyword extraction on Craigslist listings

CoSENT、STS、SentenceBERT

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NSFW A chatbot based on GPT2-chitchat