Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Last update: Jan 03, 2022

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

In this project, I fine tune T5 model on Extreme Summarization (XSum) Dataset achieving a rouge2 f score of 9.5% on test data. Further I discuss the drawbacks of ngram based metrics as well as contextual word metrics.

Finally, I propose use of Weighted Contextual N-gram (WCN) method – an alternative metric which can be more effective for evaluation of text generation tasks.

The complete documentation of the project can be found here

Dataset

I use the Extreme Summarization (XSum) Dataset. The dataset can be downloaded from here

The dataset consists of BBC articles and accompanying single sentence summaries. Specifically, each article is prefaced with an introductory sentence (aka summary) which is professionally written, typically by the author of the article.

There are two features in this dataset:
(1) document: Input news article.
(2) summary: Onesentence summary of the article.

The idea is to generate a short, one-sentence news summary answering the question ”What is the article about?”. There are in total 226k samples: 204,045 samples for training data, 11,332 samples for validation data and 11,334 samples for test data. The average number of words in a document is 431.07 (19.77 sentences) and the average number of words in a summary is 23.26.

Code

The source code for this project can be found at text_summarization.ipynb.

Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Dataset

Code

Owner

Aditya Shah

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Using Python to Play Cyberpunk 2077

A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

Baseline of DCASE 2020 task 4

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

disentanglement_lib is an open-source library for research on learning disentangled representations.

Image super-resolution through deep learning

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Contrastive Learning for Metagenomic Binning

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

Turn based roguelike in python

a reimplementation of Holistically-Nested Edge Detection in PyTorch

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

ExCon: Explanation-driven Supervised Contrastive Learning

Supervised Classification from Text (P)

Code for Max-Margin Contrastive Learning - AAAI 2022