Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Last update: Feb 07, 2022

Related tags

Overview

NLP-Summarizer

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

This project aimed to provide insight and explanations to current limitations on Natural Language Processing models by exploring the Transformer model, the latest state-of-the-art NLP solution, as well as discussing possible use cases for such tools in a domestic and workplace environment. An in-depth explanation of the architecture and the limitations it aims to solve was provided, as well as how it can be used to infer various tasks. Numerous use cases of NLP were also explored and how tools such as this can be extremely useful and have a massive impact on today’s society, both domestically and in the workplace. Three specific Transformer models were implemented using a GUI to evaluate their effectiveness. The final artefact provides a user with an interaction between the models for document summarisation tasks of variable output lengths.

Working Example

Following example created using another student's project introduction, original word count was ~1000.

Initial GUI

After Summarization

Getting Started

All code is ran using Python version 3.8.8
The artefact to be operated in it's entirety requires ~20GB of available space for downloads of the pre-trained models.

!pip install transformers
!pip install spacy==2.0.12
!pip install torch
!pip install tk

Runtime will be displayed as an output in console

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Related tags

Overview

NLP-Summarizer

Working Example

Initial GUI

After Summarization

Owner

Samuel Sharkey

Mapping a variable-length sentence to a fixed-length vector using BERT model

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

✨Fast Coreference Resolution in spaCy with Neural Networks

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Material for GW4SHM workshop, 16/03/2022.

Ecommerce product title recognition package

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Use fastai-v2 with HuggingFace's pretrained transformers

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication