An automated, headless YouTube Watcher and Scraper

Overview

Logo

MIT License Code style: black
Platform: YouTube Uses Docker Automation supporting Firefox and Chrome
Uses Tor. And an emoji of an onion, get it? Firefox supported Chrome supported

An automated, headless YouTube Watcher and Scraper

Authors: Christian C., Moritz M., Luca S.
Related Projects: YouTube Uploader, Twitch Compilation Creator, Neural Networks


About

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube automation written in Python and the dockerized Tor Browser.

This project is for educational purposes only. Using Tor to watch YouTube videos is strongly discouraged, especially for Botting purposes. Please inform yourself about the Tor network, before using it extensively.

Setup

YouTube Automation

This project requires Poetry to install the required dependencies. Check out this link to install Poetry on your operating system.

Make sure you have installed Python 3.8! Otherwise Step 3 will let you know that you have no compatible Python version installed.

  1. Clone/Download this repository
  2. Navigate to the root of the repository
  3. Run poetry install to create a virtual environment with Poetry
  4. Either run the dockerized Browser with docker-compose up, install geckodriver for a local Firefox or ChromeDriver for Chromium. Ensure that geckodriver/ChromeDriver are in a location in your $PATH.
  5. Run poetry run python main.py to run the program. Alternatively you can run poetry shell followed by python main.py. By default this connects to the dockerized Browser. To automate a different Browser use the --browser [chrome/firefox] command line option.

Dockerized Tor Browser

Running the Container requires Docker and docker-compose.

  1. Clone/Download this repository

  2. Navigate to the root of the repository

  3. Run docker-compose up. The image will be built automatically before startup.

  4. Selenium can now connect to the browser via port 4444. In Python the connection can be established with the following command.

    driver = webdriver.Remote(
        command_executor="http://127.0.0.1:4444/wd/hub",
        desired_capabilities=options,
    )

    See main.py for more information.

Run Parameters

All of these parameters are optional and a default value will be used if they are not defined. You can also get these definitions by running main.py --help

usage: main.py [-h] [-B {docker,chrome,firefox}] [-t] [--disable-tor] -s SEARCH_TERMS [-c CHANNEL_URL]

optional arguments:
  -h, --help            show this help message and exit
  -B {docker,chrome,firefox}, --browser {docker,chrome,firefox}
                        Select the driver/browser to use for executing the script.
  -t, --enable-tor      Enables Tor usage by connecting to a proxy on localhost:9050. Only usable with the docker
                        executor.
  --disable-tor         Disables the Tor proxy.
  -s SEARCH_TERMS, --search-terms SEARCH_TERMS
                        This argument declares a list of search terms which get viewed.
  -c CHANNEL_URL, --channel-url CHANNEL_URL
                        Channel URL if not declared it uses Golden Gorillas channel URL as default.
NASA APOD Discord Bot - Fetches information from NASA APOD site.

NASA APOD Discord Bot - Fetches information from NASA APOD site.

Astronomy Club IITK 4 Apr 23, 2022
VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

VG-Scraper VG-Scraper is a convinient program where you can find all the news articles instead of finding one yourself. Installing [Linux] Open a term

3 Feb 13, 2022
Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

91 Dec 08, 2022
a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

Charles Leifer 588 Dec 27, 2022
Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo

Encore Shao 2 Dec 27, 2021
An automated, headless YouTube Watcher and Scraper

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube aut

44 Oct 18, 2022
An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

TikTok Scraper An utility library to scrape data from TikTok hassle-free Go to the website » View Demo · Report Bug · Request Feature About The Projec

6 Jan 08, 2023
Parse feeds in Python

feedparser - Parse Atom and RSS feeds in Python. Copyright 2010-2020 Kurt McKee Kurt McKee 1.5k Dec 30, 2022

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down

moxiaoxi 47 Nov 23, 2022
爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

280 Dec 30, 2022
Libextract: extract data from websites

Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python

499 Dec 09, 2022
哔哩哔哩爬取器:以个人为中心

Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台,我们基于此构造社交网络。在该网络中,节点包括用户(up主),以及视频、专栏等创作产物;关系包括:用户之间,包括关注关系(following/follower),回复关系(评论区),转发关系(对视频or动态转发);用户对创

Boshen Shi 3 Oct 21, 2021
A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

8 Sep 20, 2021
for those who dont want to pay $10/month for high school game footage with ads

nfhs-scraper Disclaimer: I am in no way responsible for what you choose to do with this script and guide. I do not endorse avoiding paywalls or any il

Conrad Crawford 5 Apr 12, 2022
A package designed to scrape data from Yahoo Finance.

yahoostock A package designed to scrape data from Yahoo Finance. Installation The most simple installation method is through PIP. pip install yahoosto

Rohan Singh 2 May 28, 2022
A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

combined-shop-scraper A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items. Features Define an

2 Dec 13, 2021
Web Scraping OLX with Python and Bsoup.

webScrap WebScraping first step. Authors: Paulo, Claudio M. First steps in Web Scraping. Project carried out for training in Web Scrapping. The export

claudio paulo 5 Sep 25, 2022
An Web Scraping API for MDL(My Drama List) for Python.

PyMDL An API for MyDramaList(MDL) based on webscraping for python. Description An API for MDL to make your life easier in retriving and working on dat

6 Dec 10, 2022
A simple code to fetch comments below an Instagram post and save them to a csv file

fetch_comments A simple code to fetch comments below an Instagram post and save them to a csv file usage First you have to enter your username and pas

2 Jul 14, 2022
A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

Mika 4.8k Jan 04, 2023