(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Last update: Jul 01, 2022

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.

News

03/04/2022 Trained model and processed testing data released.
03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

Goal

News

Experiments

Step 1: Download Code, Model & Data

Step 2: Setup Environment

Step 3. Offline Computing for Chunk Embeddings

Step 4. Generating Grounded Summary

License

Disclaimer

Owner

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

Search for documents in a domain through Google. The objective is to extract metadata

Collection of useful (to me) python scripts for interacting with napari

Kerberoast with ACL abuse capabilities

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

Estimation of the CEFR complexity score of a given word, sentence or text.

用Resnet101+GPT搭建一个玩王者荣耀的AI

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

History Aware Multimodal Transformer for Vision-and-Language Navigation

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Snips Python library to extract meaning from text

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

An extensive UI tool built using new data scraped from BBC News

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)