This program scrapes information and images for movies and TV shows.

Last update: Dec 05, 2021

Related tags

Overview

Media-WebScraper

This program scrapes information and images for movies and TV shows.

Summary

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

For a given list of media, the program will scrape and save general information, images and any episode information for each media.

General Information (default):

Saved as a .txt file

This will scrape general information:

Title
Release date
Runtime
Genre
Director
Cast
Plot description

Additional information saved:

Source database used for scrape
ID for media in source database
Poster image link

Images (default):

Saved as a .jpg file

This will scrape the poster.

Episode Information (if specified):

Saved as a .csv file

This will scrape information for each episode for a TV show:

Season number
Episode number
Episode title
Episode air date
Episode description

Features:

Multithreaded scraping for media in list to greatly improve the time taken when scraping for large media lists.
Can generate a media list from folders and files in a specified directory or from user input.
Can specify save location for scraped data.
Can specify search tags for media list for a more accurate scrape.
Can choose to scrape all episode information for a TV show.
Can detect if data is already scraped which allows for scraping new media from an already scraped list of media very efficient.
Can recover missing scraped files if one or more are missing without rescraping all data.
Can retry the scrape before exiting the program if there were any incomplete scrapes (successfully scraped files will not be altered or rescraped).
Currently only supports scraping data from IMDb.

Usage:

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

Currently a terminal-based program.

Running the program using python:

Requirements: Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable file (created using pyinstaller):

Requirements: Windows 10
Creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.
- The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
- The 'temp' folder can be manually deleted after closing the program.
- (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Updates:

For information on version history, read the HISTORY markdown file.

Scrapes proxies and saves them to a text file

Proxy Scraper Scrapes proxies from https://proxyscrape.com and saves them to a file. Also has a customizable theme system Made by nell and Lamp

2 Dec 22, 2021

Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

12 Oct 28, 2022

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

1 Feb 10, 2022

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

WebScraping Web scraping Pyton program that scrapes Job website for python devel

2 Jul 22, 2022

:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

46.4k Jan 3, 2023

Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

16 Oct 8, 2022

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

6 Aug 10, 2022

Scrapes Every Email Address of Every Society in Every University

society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

18 Dec 14, 2022

Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

2 Jan 15, 2022

Releases(v1.3.0)

v1.3.0(Dec 5, 2021)
WebScrape v1.3.0

See version history document for all changes.

Running the program using python:

Download the source code.

Requirements:

Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable:

Download the WebScrape-1.3.0 zip file containing the bundled executable (created using pyinstaller).

Requirements:

Windows 10

Note:

The executable file creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.

The temporary files will delete automatically but if the program is closed abruptly, the files will remain.

The 'temp' folder can be manually deleted after closing the program.

(As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Source code(tar.gz)
Source code(zip)
WebScrape-1.3.0.zip(8.71 MB)

This program scrapes information and images for movies and TV shows.

Related tags

Overview

Media-WebScraper

Summary

General Information (default):

Images (default):

Episode Information (if specified):

Features:

Usage:

Running the program using python:

Running the program from bundled executable file (created using pyinstaller):

Updates:

You might also like...

Scrapes proxies and saves them to a text file

Meme-videos - Scrapes memes and turn them into a video compilations

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

:arrow_double_down: Dumb downloader that scrapes the web

Anonymously scrapes onlinesim.ru for new usable phone numbers.

A Python package that scrapes Google News article data while remaining undetected by Google.

Scrapes Every Email Address of Every Society in Every University

Automatically scrapes all menu items from the Taco Bell website

Releases(v1.3.0)

v1.3.0(Dec 5, 2021)

WebScrape v1.3.0

Running the program using python:

Requirements:

Running the program from bundled executable:

Requirements:

Note:

Owner

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

API to parse tibia.com content into python objects.

An application that on a given url, crowls a web page and gets all words, sorts and counts them.

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

Web Scraping Instagram photos with Selenium by only using a hashtag.

This is python to scrape overview and reviews of companies from Glassdoor.

Screen scraping and web crawling framework

Create crawler get some new products with maximum discount in banimode website

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

A simple django-rest-framework api using web scraping

A dead simple crawler to get books information from Douban.

This is a python api to scrape search results from a url.

Collection of code files to scrap different kinds of websites.

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

哔哩哔哩爬取器：以个人为中心

A Python library for automating interaction with websites.

A high-level distributed crawling framework.

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

An IpVanish Proxies Scraper