Sentiment Analysis Project

This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The training models for this Machine Learning project are built through Count Vectorizer (for the countvectorizer.py program) and TF-IDF Vectorizer (for the tdidf.py program). You can see the difference in implementation and accuracy results through both types of Vectorizers by running the programs separately (usually, TF-IDF Vectorizer is considered more accurate).

System Requirements

Use the pip install command to install the following imports:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier

Usage (description of actions performed)

1. dataset imported
2. null values deleted
3. 30% representative sample is taken to avoid slow down of system
4. sentiments column added
5. input training features and labels defined
6. dataset split into training sets and testing sets
7. text data vectorizer (using CountVectorizer or TF-IDF Vectorizer)
8. models trained:
 -  Logistic Regression (linear clasification)
 -  Support Vector Machine (linear/non-linear data separated into classes by a line/hyperplane)
 -  K Nearest Neighbor (local approximation)
9. print Accuracy Scores, Confusion Matrix, Ture Positive and Negative Rates for all three models

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Related tags

Overview

Sentiment Analysis Project

System Requirements

Usage (description of actions performed)

Contributing

License

Owner

Simran Farrukh

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

NLP command-line assistant powered by OpenAI

A python gui program to generate reddit text to speech videos from the id of any post.

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

novel deep learning research works with PaddlePaddle

Translation for Trilium Notes. Trilium Notes 中文版.

Beautiful visualizations of how language differs among document types.

A Python/Pytorch app for easily synthesising human voices

Pre-training BERT masked language models with custom vocabulary

Segmenter - Transformer for Semantic Segmentation

Faster, modernized fork of the language identification tool langid.py

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

COVID-19 Related NLP Papers

scikit-learn wrappers for Python fastText.

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Sequence model architectures from scratch in PyTorch

Code for text augmentation method leveraging large-scale language models