Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

Last update: Dec 12, 2022

Related tags

Deep Learning SimCLS

Overview

SimCLS

Code for our paper: "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

1. How to Install

Requirements

python3
conda create --name env --file spec-file.txt
pip3 install -r requirements.txt

Description of Codes

main.py -> training and evaluation procedure
model.py -> models
data_utils.py -> dataloader
utils.py -> utility functions
preprocess.py -> data preprocessing

Workspace

Following directories should be created for our experiments.

./cache -> storing model checkpoints
./result -> storing evaluation results

2. Preprocessing

We use the following datasets for our experiments.

CNN/DailyMail -> https://github.com/abisee/cnn-dailymail
XSum -> https://github.com/EdinburghNLP/XSum

For data preprocessing, please run

python preprocess.py --src_dir [path of the raw data] --tgt_dir [output path] --split [train/val/test] --cand_num [number of candidate summaries]

src_dir should contain the following files (using test split as an example):

test.source
test.source.tokenized
test.target
test.target.tokenized
test.out
test.out.tokenized

Each line of these files should contain a sample. In particular, you should put the candidate summaries for one data sample at neighboring lines in test.out and test.out.tokenized.

The preprocessing precedure will store the processed data as seperate json files in tgt_dir.

We have provided an example file in ./example.

3. How to Run

Hyper-parameter Setting

You may specify the hyper-parameters in main.py.

Train

python main.py --cuda --gpuid [list of gpuid] -l

Fine-tune

python main.py --cuda --gpuid [list of gpuid] -l --model_pt [model path]

Evaluate

python main.py --cuda --gpuid [single gpu] -e --model_pt [model path]

4. Results

CNNDM

	ROUGE-1	ROUGE-2	ROUGE-L
BART	44.39	21.21	41.28
Ours	46.67	22.15	43.54

XSum

	ROUGE-1	ROUGE-2	ROUGE-L
Pegasus	47.10	24.53	39.23
Ours	47.61	24.57	39.44

Our model outputs on these datasets can be found in ./output.

Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

Related tags

Overview

SimCLS

1. How to Install

Requirements

Description of Codes

Workspace

2. Preprocessing

3. How to Run

Hyper-parameter Setting

Train

Fine-tune

Evaluate

4. Results

CNNDM

XSum

Owner

Yixin Liu

kapre: Keras Audio Preprocessors

Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021]

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

SplineConv implementation for Paddle.

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

A Python Reconnection Tool for alt:V

Projecting interval uncertainty through the discrete Fourier transform

A Python reference implementation of the CF data model

Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

Recursive Bayesian Networks

Underwater image enhancement

Manifold-Mixup implementation for fastai V2

PyTorch and Tensorflow functional model definitions

A privacy-focused, intelligent security camera system.

PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution