Skip to content

charlesdungy/new-to-streaming-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

New to Streaming Scraper

This is a web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://no-longer-hosted.com

Future stage: Complete documentation, comments.

Note: Though GitHub may be hosting this project's demo, this project will see no future updates.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Pipeline

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

tree

License

MIT

About

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published