This program scrapes information and images for movies and TV shows.

Overview

Media-WebScraper

This program scrapes information and images for movies and TV shows.

Summary

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

For a given list of media, the program will scrape and save general information, images and any episode information for each media.

General Information (default):

Saved as a .txt file

This will scrape general information:

  • Title
  • Release date
  • Runtime
  • Genre
  • Director
  • Cast
  • Plot description

Additional information saved:

  • Source database used for scrape
  • ID for media in source database
  • Poster image link

Images (default):

Saved as a .jpg file

This will scrape the poster.

Episode Information (if specified):

Saved as a .csv file

This will scrape information for each episode for a TV show:

  • Season number
  • Episode number
  • Episode title
  • Episode air date
  • Episode description

Features:

  • Multithreaded scraping for media in list to greatly improve the time taken when scraping for large media lists.
  • Can generate a media list from folders and files in a specified directory or from user input.
  • Can specify save location for scraped data.
  • Can specify search tags for media list for a more accurate scrape.
  • Can choose to scrape all episode information for a TV show.
  • Can detect if data is already scraped which allows for scraping new media from an already scraped list of media very efficient.
  • Can recover missing scraped files if one or more are missing without rescraping all data.
  • Can retry the scrape before exiting the program if there were any incomplete scrapes (successfully scraped files will not be altered or rescraped).
  • Currently only supports scraping data from IMDb.

Usage:

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

Currently a terminal-based program.

Running the program using python:

  • Requirements: Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable file (created using pyinstaller):

  • Requirements: Windows 10
  • Creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.
    • The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
    • The 'temp' folder can be manually deleted after closing the program.
    • (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Updates:

For information on version history, read the HISTORY markdown file.

You might also like...
Scrapes proxies and saves them to a text file

Proxy Scraper Scrapes proxies from https://proxyscrape.com and saves them to a file. Also has a customizable theme system Made by nell and Lamp

Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

WebScraping Web scraping Pyton program that scrapes Job website for python devel

:arrow_double_down: Dumb downloader that scrapes the web
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

Scrapes Every Email Address of Every Society in Every University
Scrapes Every Email Address of Every Society in Every University

society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

Releases(v1.3.0)
  • v1.3.0(Dec 5, 2021)

    WebScrape v1.3.0

    See version history document for all changes.

    Running the program using python:

    Download the source code.

    Requirements:

    Python 3.2+ (additional libraries: requests, beautifulsoup4)

    Running the program from bundled executable:

    Download the WebScrape-1.3.0 zip file containing the bundled executable (created using pyinstaller).

    Requirements:

    Windows 10

    Note:

    The executable file creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.

    • The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
    • The 'temp' folder can be manually deleted after closing the program.
    • (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)
    Source code(tar.gz)
    Source code(zip)
    WebScrape-1.3.0.zip(8.71 MB)
A simple Discord scraper for discord bots

A simple Discord scraper for discord bots. That includes sending an guild members ids to an file, Mass inviter for joining servers your bot is in and Fetching all the servers of the bot (w/MemberCoun

3zg 1 Jan 06, 2022
A scalable frontier for web crawlers

Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

Scrapinghub 1.2k Jan 02, 2023
a way to scrape a database of all of the isef projects

ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

William Kaiser 1 Mar 18, 2022
Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

1 Jan 24, 2022
🐞 Douban Movie / Douban Book Scarpy

Python3-based Douban Movie/Douban Book Scarpy crawler for cover downloading + data crawling + review entry.

Xingbo Jia 1 Dec 03, 2022
原神爬虫 抓取原神界面圣遗物信息

原神圣遗物半自动爬虫 说明 直接抓取原神界面中的圣遗物数据 目前只适配了背包页面的抓取 准确率:97.5%(普通通用接口,对 40 件随机圣遗物识别,统计完全正确的数量为 39) 准确率:100%(4k 屏幕,普通通用接口,对 110 件圣遗物识别,统计完全正确的数量为 110) 不排除还有小错误的

hwa 28 Oct 10, 2022
用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。

crawler_for_university 用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。 环境依赖 wxpy,requests,bs4等库 功能描述 该项目基于python,通过爬虫爬各高校的就业信息网,爬取招聘信

8 Aug 16, 2021
Dailyiptvlist.com Scraper With Python

Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone

1 Oct 16, 2021
Luis M. Capdevielle 1 Jan 14, 2022
Scraping followers of an instagram account

ScrapInsta A script to scraping data from Instagram Install First of all you can run: pip install scrapinsta After that you need to install these requ

Matheus Kolln 1 Sep 05, 2021
Web3 Pancakeswap Sniper bot written in python3

Pancakeswap_BSC_Sniper_Bot Web3 Pancakeswap Sniper bot written in python3, Please note the license conditions! The first Binance Smart Chain sniper bo

Treading-Tigers 295 Dec 31, 2022
基于Github Action的定时HITsz疫情上报脚本,开箱即用

HITsz Daily Report 基于 GitHub Actions 的「HITsz 疫情系统」访问入口 定时自动上报脚本,开箱即用。 感谢 @JellyBeanXiewh 提供原始脚本和 idea。 感谢 @bugstop 对脚本进行重构并新增 Easy Connect 校内代理访问。

Ter 56 Nov 27, 2022
Get-web-images - A python code that get images from any site

image retrieval This is a python code to retrieve an image from the internet, a

CODE 1 Dec 30, 2021
Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

Faeze Ghorbanpour 1 Dec 30, 2021
Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

Soccer Project 2 Nov 01, 2021
抖音批量下载用户所有无水印视频

Douyincrawler 抖音批量下载用户所有无水印视频 Run 安装python3, 安装依赖

28 Dec 08, 2022
download NCERT books using scrapy

download_ncert_books download NCERT books using scrapy Downloading Books: You can either use the spider by cloning this repo and following the instruc

1 Dec 02, 2022
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022
热搜榜-python爬虫+正则re+beautifulsoup+xpath

仓库简介 微博热搜榜, 参数wb 百度热搜榜, 参数bd 360热点榜, 参数360 csdn热榜接口, 下方查看 其他热搜待加入 如何使用? 注册vercel fork到你的仓库, 右上角 点击这里完成部署(一键部署) 请求参数 vercel配置好的地址+api?tit=+参数(仓库简介有参数信息

Harry 3 Jul 08, 2022
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

lxSpider 爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》 简介: 时光荏苒,记不清写了多少案例了。

lx 793 Jan 05, 2023