Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Overview

IMDB Sentiment Analysis

This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial Intelligence and Automation

Training

To train a model (CNN, LSTM, Transformer), simply run

python train.py --cfg <./model/xxx> --save <./save/>

You can change the configuration in config.

Model

LSTM

we follow the origin LSTM as possible

lstm

CNN

we adopt the methods mentioned in Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

cnn

Transformer

We use the original Transformer Encoder as Attention is all you need and use the concept of CLS Token as BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

trans

Experiment result

Model Comparison

Model Accuracy
LSTM 89.02
Transformer 87.47
CNN 88.66
Fine-tuned BERT 93.43

LSTM

Batch size
Batch size Loss Accuracy
64 0.4293 0.8802
128 0.4298 0.8818
256 0.4304 0.8836
512 0.4380 0.8807
Embedding Size
Embedding size train Loss train Accuracy val loss val accuracy
32 0.4021 0.9127 0.4419 0.8707
64 0.3848 0.9306 0.4297 0.8832
128 0.3772 0.9385 0.4265 0.8871
256 0.3584 0.9582 0.4303 0.8825
512 0.3504 0.9668 0.4295 0.8838
Drop out
Drop out rate Train Loss Train Accuracy Test loss Test Accuracy
0.0 0.3554 0.9623 0.4428 0.8704
0.1 0.3475 0.9696 0.4353 0.8780
0.2 0.3516 0.9652 0.4312 0.8825
0.3 0.3577 0.9589 0.4292 0.8844
0.4 0.3587 0.9576 0.4272 0.8868
0.5 0.3621 0.9544 0.4269 0.8865
0.6 0.3906 0.9242 0.4272 0.8863
0.7 0.3789 0.9356 0.4303 0.8826
0.8 0.3939 0.9204 0.4311 0.8826
0.9 0.4211 0.8918 0.4526 0.8584
Weight decay
Weight decay train loss train accuracy test loss test accuracy
1.0e-8 0.3716 0.9436 0.4261 0.8876
1.0e-7 0.3803 0.9349 0.4281 0.8862
1.0e-6 0.3701 0.9456 0.4264 0.8878
1.0e-5 0.3698 0.9461 0.4283 0.8850
1.0e-4 0.3785 0.9377 0.4318 0.8806
Number layers

Number of LSTM blocks

Number layers train loss train accuracy test loss test accuracy
1 0.3786 0.9364 0.4291 0.8844
2 0.3701 0.9456 0.4264 0.8878
3 0.3707 0.9451 0.4243 0.8902
4 0.3713 0.9446 0.4279 0.8857

CNN

out channel size
out size train acc test acc
8 0.9679 0.8743
16 0.9791 0.8767
32 0.9824 0.8811
64 0.9891 0.8848
128 0.9915 0.8824
256 0.9909 0.8827
512 0.9920 0.8841
1024 0.9959 0.8833
multi scale filter
Number train acc test acc
1 [5] 0.9698 0.8748
2 [5, 11] 0.9852 0.8827
3 [5, 11, 17] 0.9890 0.8850
4 [5, 11, 17, 23] 0.9915 0.8848
5 [5, 11, 17, 23, 29] 0.9924 0.8842
6 [5, 11, 17, 23, 29, 35] 0.9930 0.8836
step train acc test acc
2 [5 7 9] 0.9878 0.8816
4 [5 9 11] 0.9890 0.8816
6 [5 11 17] 0.9919 0.8834
8 [5 13 21] 0.9884 0.8836
10[5 15 25] 0.9919 0.8848
12[5 17 29] 0.9898 0.8812
14[5 29 43] 0.9935 0.8809
Owner
Daniel
Daniel
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Won Joon Yoo 335 Jan 04, 2023
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
Korea Spell Checker

한국어 문서 koSpellPy Korean Spell checker How to use Install pip install kospellpy Use from kospellpy import spell_init spell_checker = spell_init() # d

kangsukmin 2 Oct 20, 2021
Japanese synonym library

chikkarpy chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar. chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

Works Applications 48 Dec 14, 2022
Code associated with the Don't Stop Pretraining ACL 2020 paper

dont-stop-pretraining Code associated with the Don't Stop Pretraining ACL 2020 paper Citation @inproceedings{dontstoppretraining2020, author = {Suchi

AI2 449 Jan 04, 2023
Library for Russian imprecise rhymes generation

TOM RHYMER Library for Russian imprecise rhymes generation. Quick Start Generate rhymes by any given rhyme scheme (aabb, abab, aaccbb, etc ...): from

Alexey Karnachev 6 Oct 18, 2022
KR-FinBert And KR-FinBert-SC

KR-FinBert & KR-FinBert-SC Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adapt

5 Jul 29, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Accuracy of BBC Weather forecasts for Honolulu This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, t

Max Halford 12 Oct 15, 2022
NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

NV Access 1.6k Jan 07, 2023
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成 说明 config.py里面含有训练、测试、预测的参数,更改后运行: python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

西西嘛呦 3 May 26, 2022
MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 09, 2022
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022
LewusBot - Twitch ChatBot built in python with twitchio library

LewusBot Twitch ChatBot built in python with twitchio library. Uses twitch/leagu

Lewus 25 Dec 04, 2022
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包,通过轻量、快速的裁剪方法对模型进行结构化剪枝,从而实现压缩模型体积、提升模型速度。 其他相关资源: 知识蒸馏工具TextBrewer:https://github.com/airaria/TextBrewe

Ziqing Yang 231 Jan 08, 2023