(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Last update: Jul 01, 2022

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.

News

03/04/2022 Trained model and processed testing data released.
03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

Goal

News

Experiments

Step 1: Download Code, Model & Data

Step 2: Setup Environment

Step 3. Offline Computing for Chunk Embeddings

Step 4. Generating Grounded Summary

License

Disclaimer

Owner

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Labelling platform for text using distant supervision

Awesome Treasure of Transformers Models Collection

An A-SOUL Text Generator Based on CPM-Distill.

Natural Language Processing at EDHEC, 2022

This repository is home to the Optimus data transformation plugins for various data processing needs.

An Explainable Leaderboard for NLP

keras implement of transformers for humans

Py65 65816 - Add support for the 65C816 to py65

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

A natural language modeling framework based on PyTorch

PG-19 Language Modelling Benchmark

Pytorch-Named-Entity-Recognition-with-BERT

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

Natural Language Processing with transformers

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

Blue Brain text mining toolbox for semantic search and structured information extraction

In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

Problem: Given a nepali news find the category of the news