A New Approach to Overgenerating and Scoring Abstractive Summaries

Overview

A New Approach to Overgenerating and Scoring Abstractive Summaries

We provide the source code for the paper "A New Approach to Overgenerating and Scoring Abstractive Summaries" accepted at NAACL'21. If you find the code useful, please cite the following paper.

@inproceedings{song2021new, 
    title={A New Approach to Overgenerating and Scoring Abstractive Summaries},
    author={Song, Kaiqiang and Wang, Bingqing and Feng, Zhe and Liu, Fei},
    booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    pages={1392--1404},
    year={2021}
}

Presentation Video

Check Our Presentation Video

Demo

Source Input:

The Bank of Japan appealed to financial markets to remain calm Friday following the US decision to order Daiwa Bank Ltd. to close its US operations.

Summaries with varying lengths:

Dependencies

The code is written in Python (v3.7) and Pytorch (v1.7+). We suggest the following enviorment:

HINT: Since huggingface transformers is alternating very fast, you may need to modify a lot of stuff if you want to use a new version. Contact me if you get any trouble on it.

To install pyrouge and transformers, run the command below:

pip install pyrouge transformers==2.3.0

For generating summaries with varying length

Step 1: clone this repo. Download trained Our Model, move it to the working folder and uncompress it.

git clone https://github.com/ucfnlp/varying-length-summ.git
mv model.zip varying-length-summ
cd varying-length-summ
unzip models.zip

Step 2: Generating summaries with varying length from a raw input file.

python run.py --do_test --parallel --input data/input.txt

It will generate summaries of varying lengths coupled with its order information.

For Selecting summaries with best quality binary classifer

Step 1: Follow the previous section about generating summaries with multiple length.

Step 2: Collect test set similar to data/gigaword_cls/test500* files:

  1. a source input file test500_input.txt

  2. a target output file test500_output.txt

  3. a label file test500_label.txt for whether the target summary is admissible for the source input. (all 0 if you don't have thoese labels)

HINT: one instance per line

Step 3: modify the test500 settings in settings/dataset/gigaword_cls.

Step 4: Run the code below.

python run_classifier.py --do_test --parallel

It will generate a prediction of admissible probability in predict.txt.

For Selecting summaries with length reward reranking method

Step 1: Follow the previous section about generating summaries with multiple length.

Step 2: Run the code below.

python run_rerank.py

It will re-rank the summary with length rewards. The predicted length is in length.txt

For Data Downloading (500 inputs x 7 lengths)

Please refer to this link

Owner
Kaiqiang Song
A PhD student of [email protected]. Interest in Machine Learning, Deep Neural Networks, Natural Lang
Kaiqiang Song
This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

HealthGen: Conditional EHR Time Series Generation This repository contains the implementation of the HealthGen model, a generative model to synthesize

0 Jan 20, 2022
An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches

Transformer-in-Transformer An Implementation of the Transformer in Transformer paper by Han et al. for image classification, attention inside local pa

Rishit Dagli 40 Jul 25, 2022
TabNet for fastai

TabNet for fastai This is an adaptation of TabNet (Attention-based network for tabular data) for fastai (=2.0) library. The original paper https://ar

Mikhail Grankin 116 Oct 21, 2022
Pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

About this repository This repo contains an Pytorch implementation for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Netwo

wxDai 7 Oct 14, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Train Dense Passage Retriever (DPR) with a single GPU

Gradient Cached Dense Passage Retrieval Gradient Cached Dense Passage Retrieval (GC-DPR) - is an extension of the original DPR library. We introduce G

Luyu Gao 92 Jan 02, 2023
Funnels: Exact maximum likelihood with dimensionality reduction.

Funnels This repository contains the code needed to reproduce the experiments from the paper: Funnels: Exact maximum likelihood with dimensionality re

2 Apr 21, 2022
Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Get Fooled for the Right Reason Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness throu

Sowrya Gali 1 Apr 25, 2022
A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Yasmeen Brain 10 Oct 06, 2022
CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

tzt 49 Nov 17, 2022
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

RR_Inyo 3 Sep 23, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

Object DGCNN & DETR3D This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110

Wang, Yue 539 Jan 07, 2023
πŸ₯‡ LG-AI-Challenge 2022 1μœ„ μ†”λ£¨μ…˜ μž…λ‹ˆλ‹€.

LG-AI-Challenge-for-Plant-Classification Daconμ—μ„œ μ§„ν–‰λœ 농업 ν™˜κ²½ 변화에 λ”°λ₯Έ μž‘λ¬Ό 병해 진단 AI κ²½μ§„λŒ€νšŒ 에 λŒ€ν•œ μ½”λ“œμž…λ‹ˆλ‹€. (colab directory에 μ½”λ“œκ°€ 잘 정리 λ˜μ–΄μžˆμŠ΅λ‹ˆλ‹€.) Requirements python

siwooyong 10 Jun 30, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

PlyTitle_Generation This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach. The paper has been accepted by

SeungHeonDoh 6 Jan 03, 2022