Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Last update: Jan 18, 2022

Related tags

Deep Learning IMDB-Success-Predictor

Overview

IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Related tags

Overview

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

Owner

Gautam Diwan

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

Add gui for YoloV5 using PyQt5

Algebraic effect handlers in Python

SBINN: Systems-biology informed neural network

you can add any codes in any language by creating its respective folder (if already not available).

Largest list of models for Core ML (for iOS 11+)

The official homepage of the (outdated) COCO-Stuff 10K dataset.

Diverse Image Generation via Self-Conditioned GANs

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

We have made you a wrapper you can't refuse

source code of Adversarial Feedback Loop Paper

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Background Matting: The World is Your Green Screen

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Code for the Active Speakers in Context Paper (CVPR2020)