Modified GPT using average pooling to reduce the softmax attention memory constraints.

Last update: Dec 03, 2021

Overview

NLP-GPT-Upsampling

This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Nystromformer implementation to approximate the full attention softmax matrix to model longer sequences in NLP language modeling tasks by a simple strided average pooling of the input text sequence to reduce the sequence length. The reduced length attention output is then upsampled back to the original sequence length using the bilinear method.

It should be noted that due to the simplicity of this implementation, the performance of the model will not be comparable to the original GPT model utilising the full attention matrix. The tradeoff is that this naive strided averaging would be able to model longer sequences as compared to the original GPT implementation.

Fig. 1: GPT Model Architecture (obtained from GPT paper)

Data

This repository includes codes to process the Movie Dialogue dataset, where the preparation of the data follows this script closely, as well as the Reddit Jokes dataset.

To prepare the data prior to training the model(s), run

python process_movie_dialogue_subword.py

for the Movie Dialogue dataset, or

python process_reddit_jokes_subword_v1.py

for the Reddit Jokes dataset.

Training and Model Inference

Having processed the data into sub-word tokens, run

python train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
python infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py

python train_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py
python infer_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py

to train the respective models based on the dataset loaded and perform inference of the trained model.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Related tags

Overview

NLP-GPT-Upsampling

Data

Training and Model Inference

Owner

WD

Simple Annotated implementation of GPT-NeoX in PyTorch

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

An extensive UI tool built using new data scraped from BBC News

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

The tool to make NLP datasets ready to use

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Adversarial Examples for Extreme Multilabel Text Classification

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

ConvBERT-Prod

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Global Rhythm Style Transfer Without Text Transcriptions

Script and models for clustering LAION-400m CLIP embeddings.

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

MRC approach for Aspect-based Sentiment Analysis (ABSA)

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

An assignment on creating a minimalist neural network toolkit for CS11-747

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

Findings of ACL 2021