Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Overview

Evaluating the Factual Consistency of Abstractive Text Summarization

Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher

Introduction

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks:

  1. identify whether sentences remain factually consistent after transformation,
  2. extract a span in the source documents to support the consistency prediction,
  3. extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Paper link: https://arxiv.org/abs/1910.12840

Table of Contents

  1. Updates
  2. Citation
  3. License
  4. Usage
  5. Get Involved

Updates

1/27/2020

Updated manually annotated data files - fixed filepaths in misaligned examples.

Updated model checkpoint files - recomputed evaluation metrics for fixed examples.

Citation

@article{kryscinskiFactCC2019,
  author    = {Wojciech Kry{\'s}ci{\'n}ski and Bryan McCann and Caiming Xiong and Richard Socher},
  title     = {Evaluating the Factual Consistency of Abstractive Text Summarization},
  journal   = {arXiv preprint arXiv:1910.12840},
  year      = {2019},
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from violence, hate, and division, environmental destruction, abuse of human rights, or the destruction of people's physical and mental health.

Usage

Code repository uses Python 3. Prior to running any scripts please make sure to install required Python packages listed in the requirements.txt file.

Example call: pip3 install -r requirements.txt

Training and Evaluation Datasets

Generated training data can be found here.

Manually annotated validation and test data can be found here.

Both generated and manually annotated datasets require pairing with the original CNN/DailyMail articles.

To recreate the datasets follow the instructions:

  1. Download CNN Stories and Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
  2. Create a cnndm directory and unpack downloaded files into the directory
  3. Download and unpack FactCC data (do not rename directory)
  4. Run the pair_data.py script to pair the data with original articles

Example call:

python3 data_pairing/pair_data.py <dir-with-factcc-data> <dir-with-stories>

Generating Data

Synthetic training data can be generated using code available in the data_generation directory.

The data generation script expects the source documents input as one jsonl file, where each source document is embedded in a separate json object. The json object is required to contain an id key which stores an example id (uniqness is not required), and a text field that stores the text of the source document.

Certain transformations rely on NER tagging, thus for best results use source documents with original (proper) casing.

The following claim augmentations (transformations) are available:

  • backtranslation - Paraphrasing claim via backtranslation (requires Google Translate API key; costs apply)
  • pronoun_swap - Swapping a random pronoun in the claim
  • date_swap - Swapping random date/time found in the claim with one present in the source article
  • number_swap - Swapping random number found in the claim with one present in the source article
  • entity_swap - Swapping random entity name found in the claim with one present in the source article
  • negation - Negating meaning of the claim
  • noise - Injecting noise into the claim sentence

For a detailed description of available transformations please refer to Section 3.1 in the paper.

To authenticate with the Google Cloud API follow these instructions.

Example call:

python3 data_generation/create_data.py <source-data-file> [--augmentations list-of-augmentations]

Model Code

FactCC and FactCCX models can be trained or initialized from a checkpoint using code available in the modeling directory.

Quickstart training, fine-tuning, and evaluation scripts are shared in the scripts directory. Before use make sure to update *_PATH variables with appropriate, absolute paths.

To customize training or evaluation settings please refer to the flags in the run.py file.

To utilize Weights&Biases dashboards login to the service using the following command: wandb login <API KEY>.

Trained FactCC model checkpoint can be found here.

Trained FactCCX model checkpoint can be found here.

IMPORTANT: Due to data pre-processing, the first run of training or evaluation code on a large dataset can take up to a few hours before the actual procedure starts.

Running on other data

To run pretrained FactCC or FactCCX models on your data follow the instruction:

  1. Download pre-trained model checkpoint, linked above
  2. Prepare your data in jsonl format. Each example should be a separate json object with id, text, claim keys representing example id, source document, and claim sentence accordingly. Name file as data-dev.jsonl
  3. Update corresponding *-eval.sh script

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
Knowledge Management for Humans using Machine Learning & Tags

HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.

Ravn Tech, Inc. 165 Nov 04, 2022
Embeddinghub is a database built for machine learning embeddings.

Embeddinghub is a database built for machine learning embeddings.

Featureform 1.2k Jan 01, 2023
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Continuous Security Group Rule Change Detection & Response at scale

Introduction Get notified of Security Group Changes across all AWS Accounts & Regions in an AWS Organization, with the ability to respond/revert those

Raajhesh Kannaa Chidambaram 3 Aug 13, 2022
A no-BS, dead-simple training visualizer for tf-keras

A no-BS, dead-simple training visualizer for tf-keras TrainingDashboard Plot inter-epoch and intra-epoch loss and metrics within a jupyter notebook wi

Vibhu Agrawal 3 May 28, 2021
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
Computing Shapley values using VAEAC

Shapley values and the VAEAC method In this GitHub repository, we present the implementation of the VAEAC approach from our paper "Using Shapley Value

3 Nov 23, 2022
Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

Microsoft 344 Dec 29, 2022
Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

Cooperative Driving Dataset (CODD) The Cooperative Driving dataset is a synthetic dataset generated using CARLA that contains lidar data from multiple

Eduardo Henrique Arnold 124 Dec 28, 2022
Meta Learning for Semi-Supervised Few-Shot Classification

few-shot-ssl-public Code for paper Meta-Learning for Semi-Supervised Few-Shot Classification. [arxiv] Dependencies cv2 numpy pandas python 2.7 / 3.5+

Mengye Ren 501 Jan 08, 2023
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p

Microsoft 2.6k Jan 06, 2023
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
Deep Learning for Morphological Profiling

Deep Learning for Morphological Profiling An end-to-end implementation of a ML System for morphological profiling using self-supervised learning to di

Danielh Carranza 0 Jan 20, 2022
This repository is all about spending some time the with the original problem posed by Minsky and Papert

This repository is all about spending some time the with the original problem posed by Minsky and Papert. Working through this problem is a great way to begin learning computer vision.

Jaissruti Nanthakumar 1 Jan 23, 2022
Code for the paper "Multi-task problems are not multi-objective"

Multi-Task problems are not multi-objective This is the code for the paper "Multi-Task problems are not multi-objective" in which we show that the com

Michael Ruchte 5 Aug 19, 2022
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP] Unofficial Pytorch implementation of AdaSpeech 2. Requirements : All code written i

Rishikesh (ऋषिकेश) 63 Dec 28, 2022
这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer

Time Series Research with Torch 这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer。 建立原因 相较于mxnet和TF,Torch框架中的神经网络层需要提前指定输入维度: # 建立线性层 TensorF

Chi Zhang 85 Dec 29, 2022
Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Modified prey-predator system We aim to study the behaviors of the modified prey–predator model and establish the effects of several parameters that p

Seoyoung Oh 1 Jan 02, 2022
Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

TFlite Ultra Fast Lane Detection Inference Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite. So

Ibai Gorordo 12 Aug 27, 2022