A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

A Web Scraping Program.

The core packages of security analyzer web crawler

Displays market info for the LUNI token on the Terra Blockchain

Web Scraping Framework

Creating Scrapy scrapers via the Django admin interface

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

Screen scraping and web crawling framework

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Visual scraping for Scrapy

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Pseudo API for Google Trends

UsernameScraperTool - Username Scraper Tool With Python

for those who dont want to pay $10/month for high school game footage with ads

Transistor, a Python web scraping framework for intelligent use cases.

Scrapes all articles and their headlines from theonion.com

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Works very well and you can ask for the type of image you want the scrapper to collect.

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file