A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Last update: Nov 20, 2021

Overview

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Abstract

In this paper sentiment analysis has been performed in order to evaluate the performance of XLNet on this particular task. XLNet is rather a ground-breaking network on language understanding which uses the perks of both autoregressive models and autoencoders. While BERT uses autoencoders and Transformers use autoregression, XLNet combines the aforementioned networks’ attributes in order to achieve higher performance in many NLP tasks, such as sentiment analysis, question answering, reading comprehension, natural language understanding etc. In this work we evaluate the XLNet model in several sentiment classification tasks in terms of accuracy and efficiency. The XLNet reaches state of the art results and outperforms BERT which is the previous state of the art model on natural language processing.

This was an assignment for the course of Deep learning in PhD program of National Technical Unicersity of Athens

Team composed of 3 persons
Runs has been made on HPC-ARIS through batch scripts
Course grade 10/10 (excellent)
Full report formatted as a paper in here
Code for 2 sentiment analysis tasks out of 3 (implemented by the author of this repo) in here
Data available here

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Related tags

Overview

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Abstract

This was an assignment for the course of Deep learning in PhD program of National Technical Unicersity of Athens

Owner

James Zaridis

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

A library for end-to-end learning of embedding index and retrieval model

DELTA is a deep learning based natural language and speech processing platform.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

Binary LSTM model for text classification

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

NLP library designed for reproducible experimentation management

NLTK Source

NLP codes implemented with Pytorch (w/o library such as huggingface)

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Library for fast text representation and classification.

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

Share constant definitions between programming languages and make your constants constant again

Text-Based zombie apocalyptic decision-making game in Python

A method for cleaning and classifying text using transformers.

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

NLP project that works with news (NER, context generation, news trend analytics)