Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Overview

Evaluating the Factual Consistency of Abstractive Text Summarization

Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher

Introduction

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks:

  1. identify whether sentences remain factually consistent after transformation,
  2. extract a span in the source documents to support the consistency prediction,
  3. extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Paper link: https://arxiv.org/abs/1910.12840

Table of Contents

  1. Updates
  2. Citation
  3. License
  4. Usage
  5. Get Involved

Updates

1/27/2020

Updated manually annotated data files - fixed filepaths in misaligned examples.

Updated model checkpoint files - recomputed evaluation metrics for fixed examples.

Citation

@article{kryscinskiFactCC2019,
  author    = {Wojciech Kry{\'s}ci{\'n}ski and Bryan McCann and Caiming Xiong and Richard Socher},
  title     = {Evaluating the Factual Consistency of Abstractive Text Summarization},
  journal   = {arXiv preprint arXiv:1910.12840},
  year      = {2019},
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from violence, hate, and division, environmental destruction, abuse of human rights, or the destruction of people's physical and mental health.

Usage

Code repository uses Python 3. Prior to running any scripts please make sure to install required Python packages listed in the requirements.txt file.

Example call: pip3 install -r requirements.txt

Training and Evaluation Datasets

Generated training data can be found here.

Manually annotated validation and test data can be found here.

Both generated and manually annotated datasets require pairing with the original CNN/DailyMail articles.

To recreate the datasets follow the instructions:

  1. Download CNN Stories and Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
  2. Create a cnndm directory and unpack downloaded files into the directory
  3. Download and unpack FactCC data (do not rename directory)
  4. Run the pair_data.py script to pair the data with original articles

Example call:

python3 data_pairing/pair_data.py <dir-with-factcc-data> <dir-with-stories>

Generating Data

Synthetic training data can be generated using code available in the data_generation directory.

The data generation script expects the source documents input as one jsonl file, where each source document is embedded in a separate json object. The json object is required to contain an id key which stores an example id (uniqness is not required), and a text field that stores the text of the source document.

Certain transformations rely on NER tagging, thus for best results use source documents with original (proper) casing.

The following claim augmentations (transformations) are available:

  • backtranslation - Paraphrasing claim via backtranslation (requires Google Translate API key; costs apply)
  • pronoun_swap - Swapping a random pronoun in the claim
  • date_swap - Swapping random date/time found in the claim with one present in the source article
  • number_swap - Swapping random number found in the claim with one present in the source article
  • entity_swap - Swapping random entity name found in the claim with one present in the source article
  • negation - Negating meaning of the claim
  • noise - Injecting noise into the claim sentence

For a detailed description of available transformations please refer to Section 3.1 in the paper.

To authenticate with the Google Cloud API follow these instructions.

Example call:

python3 data_generation/create_data.py <source-data-file> [--augmentations list-of-augmentations]

Model Code

FactCC and FactCCX models can be trained or initialized from a checkpoint using code available in the modeling directory.

Quickstart training, fine-tuning, and evaluation scripts are shared in the scripts directory. Before use make sure to update *_PATH variables with appropriate, absolute paths.

To customize training or evaluation settings please refer to the flags in the run.py file.

To utilize Weights&Biases dashboards login to the service using the following command: wandb login <API KEY>.

Trained FactCC model checkpoint can be found here.

Trained FactCCX model checkpoint can be found here.

IMPORTANT: Due to data pre-processing, the first run of training or evaluation code on a large dataset can take up to a few hours before the actual procedure starts.

Running on other data

To run pretrained FactCC or FactCCX models on your data follow the instruction:

  1. Download pre-trained model checkpoint, linked above
  2. Prepare your data in jsonl format. Each example should be a separate json object with id, text, claim keys representing example id, source document, and claim sentence accordingly. Name file as data-dev.jsonl
  3. Update corresponding *-eval.sh script

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
System Design course at HSE (2021)

System Design course at HSE (2021) Wiki-страница курса Структура репозитория: slides - директория с презентациями с занятий tasks - материалы для выпо

22 Dec 25, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Semi-supervised learning for object detection

Source code for STAC: A Simple Semi-Supervised Learning Framework for Object Detection STAC is a simple yet effective SSL framework for visual object

Google Research 348 Dec 25, 2022
Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

Tarek Naous 24 Nov 11, 2022
A configurable, tunable, and reproducible library for CTR prediction

FuxiCTR This repo is the community dev version of the official release at huawei-noah/benchmark/FuxiCTR. Click-through rate (CTR) prediction is an cri

XUEPAI 397 Dec 30, 2022
A check for whether the dependency jobs are all green.

alls-green A check for whether the dependency jobs are all green. Why? Do you have more than one job in your GitHub Actions CI/CD workflows setup? Do

Re:actors 33 Jan 03, 2023
Open CV - Convert a picture to look like a cartoon sketch in python

Use the video https://www.youtube.com/watch?v=k7cVPGpnels for initial learning.

Sammith S Bharadwaj 3 Jan 29, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Efficient neural networks for analog audio effect modeling

micro-TCN Efficient neural networks for audio effect modeling

Christian Steinmetz 94 Dec 29, 2022
Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

FastSpecFit Introduction This repository contains code and documentation to perform fast, simple spectral synthesis and emission-line fitting of DESI

5 Aug 02, 2022
GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

UW-Madison Computational Materials Group 2 Feb 10, 2021
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Official DGL implementation of "Rethinking High-order Graph Convolutional Networks"

SE Aggregation This is the implementation for Rethinking High-order Graph Convolutional Networks. Here we show the codes for citation networks as an e

Tianqi Zhang (张天启) 32 Jul 19, 2022
Code for our paper: Online Variational Filtering and Parameter Learning

Variational Filtering To run phi learning on linear gaussian (Fig1a) python linear_gaussian_phi_learning.py To run phi and theta learning on linear g

16 Aug 14, 2022
Code for binary and multiclass model change active learning, with spectral truncation implementation.

Model Change Active Learning Paper (To Appear) Python code for doing active learning in graph-based semi-supervised learning (GBSSL) paradigm. Impleme

Kevin Miller 1 Jul 24, 2022
A hyperparameter optimization framework

Optuna: A hyperparameter optimization framework Website | Docs | Install Guide | Tutorial Optuna is an automatic hyperparameter optimization software

7.4k Jan 04, 2023
Calibrated Hyperspectral Image Reconstruction via Graph-based Self-Tuning Network.

mask-uncertainty-in-HSI This repository contains the testing code and pre-trained models for the paper Calibrated Hyperspectral Image Reconstruction v

JIAMIAN WANG 9 Dec 29, 2022
TorchXRayVision: A library of chest X-ray datasets and models.

torchxrayvision A library for chest X-ray datasets and models. Including pre-trained models. ( 🎬 promo video about the project) Motivation: While the

Machine Learning and Medicine Lab 575 Jan 08, 2023
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

HTSeq 57 Dec 20, 2022