Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Last update: Jan 03, 2022

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

In this project, I fine tune T5 model on Extreme Summarization (XSum) Dataset achieving a rouge2 f score of 9.5% on test data. Further I discuss the drawbacks of ngram based metrics as well as contextual word metrics.

Finally, I propose use of Weighted Contextual N-gram (WCN) method – an alternative metric which can be more effective for evaluation of text generation tasks.

The complete documentation of the project can be found here

Dataset

I use the Extreme Summarization (XSum) Dataset. The dataset can be downloaded from here

The dataset consists of BBC articles and accompanying single sentence summaries. Specifically, each article is prefaced with an introductory sentence (aka summary) which is professionally written, typically by the author of the article.

There are two features in this dataset:
(1) document: Input news article.
(2) summary: Onesentence summary of the article.

The idea is to generate a short, one-sentence news summary answering the question ”What is the article about?”. There are in total 226k samples: 204,045 samples for training data, 11,332 samples for validation data and 11,334 samples for test data. The average number of words in a document is 431.07 (19.77 sentences) and the average number of words in a summary is 23.26.

Code

The source code for this project can be found at text_summarization.ipynb.

Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Dataset

Code

Owner

Aditya Shah

Machine learning Bot detection technique, based on United States election dataset

Deep Learning for Morphological Profiling

Repository for the NeurIPS 2021 paper: "Exploiting Domain-Specific Features to Enhance Domain Generalization".

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Planner_backend - Academic planner application designed for students and counselors.

Minimal fastai code needed for working with pytorch

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

RoboDesk A Multi-Task Reinforcement Learning Benchmark

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

NBEATSx: Neural basis expansion analysis with exogenous variables

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Implementation of the ALPHAMEPOL algorithm, presented in Unsupervised Reinforcement Learning in Multiple Environments.

FB-tCNN for SSVEP Recognition