Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Last update: Jan 12, 2022

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

The main part of the work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

The complete text and analysis of the work is available and located in EDA-and-Sentiment-Analysis-on IMDB-Dataset.pdf file
The implementation and code of the project is located in the Implementation-Python Files folder.

Overview

The goal of this work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Dataset

For this work, a large dataset which consists of movie reviews was used. Specifically, the publicly available Internet Movie Database (IMDB) review dataset

The data can be obtained from Kaggle or direcetly from Stanford

Methodology

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the initial questions were set in respect to the used dataset. Subsequentially, the data scrapping and data collection were performed. In addition, after the data preprocessing steps were performed, different data analytics and analysis were ,employed in order to better understand the data insights. Finally, during the final analysis, different methodologies and models were utilized in order to classify the textual data based on the sentiment. It is crucial to mention that the whole processed followed a cyclical scheme.

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Related tags

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Technical-Report and Code Availability

Overview

Dataset

Methodology

Owner

Nikolas Petrou

Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

A fast, efficient universal vector embedding utility package.

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

The source code of HeCo

SAINT PyTorch implementation

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

Just a Basic like Language for Zeno INC

AudioCLIP Extending CLIP to Image, Text and Audio

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

Stuff related to Ben Eater's 8bit breadboard computer

Semi-automated vocabulary generation from semantic vector models

Facilitating the design, comparison and sharing of deep text matching models.

Python package for performing Entity and Text Matching using Deep Learning.