SkyScrapers: A collection of variety of Scraping Apps

Last update: Feb 17, 2022

Related tags

Web Crawling SkyScrapers

Overview

SkyScrapers

Collection of variety of Web Scraping Apps

The web-scrapers involved in this project are:

StockSymbolScraper
UnsplashImagesScraper

Tech

ScyScrapers uses a number of open source projects to work properly:

BeautifulSoup - Beautiful Soup is a Python library for pulling data out of HTML and XML files.
Selenium - Selenium is for automating web applications for testing purposes.

❗ ❗ ❗ Please check the markdown files named as THEORY inside respecetive project directories to get insight about the individual project.

And of course SkyScrapers itself is open source with a public repository on GitHub.

Installation

SkyScrapers requires Python's latest version to run.

Install the dependencies before start.

npm install <package_name>

Owner

Biplov Pokhrel

Just started. NITR 2023.

GitHub Repository

AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

5 Nov 25, 2021

Extract embedded metadata from HTML markup

extruct extruct is a library for extracting embedded metadata from HTML markup. Currently, extruct supports: W3C's HTML Microdata embedded JSON-LD Mic

725 Jan 03, 2023

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022

Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

8.7k Jan 05, 2023

mlscraper: Scrape data from HTML pages automatically with Machine Learning

🤖 Scrape data from HTML websites automatically with Machine Learning

798 Dec 29, 2022

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib

15 Nov 24, 2022

Pseudo API for Google Trends

pytrends Introduction Unofficial API for Google Trends Allows simple interface for automating downloading of reports from Google Trends. Only good unt

2.6k Dec 28, 2022

Python scraper to check for earlier appointments in Clalit Health Services

clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

16 Sep 17, 2022

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

5 Nov 29, 2022

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

1 Nov 05, 2021

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and commentary data. Each output/pr number folder contains comments.json:

8 Oct 12, 2022

让中国用户使用git从github下载的速度提高1000倍!

序言 github上有很多好项目,但是国内用户连github却非常的慢.每次都要用插件或者其他工具来解决. 这次自己做一个小工具,输入github原地址后,就可以自动替换为代理地址,方便大家更快速的下载. 安装 pip install cit 主要功能与用法主要功能 change 将目标地址转换为

35 Aug 29, 2022

Dictionary - Application focused on word search through web scraping

Dictionary - Application focused on word search through web scraping, in addition to other functions such as dictation, spell and conjugation of syllables.

2 May 09, 2022

WebScrapping Project - G1 Latest News

Web Scrapping com Python Esse projeto consiste em um código para o usuário buscar as últimas nóticias sobre um termo qualquer, no site G1. Para esse p

2 Feb 13, 2022

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Unja Fetch Known Urls What's Unja? Unja is a fast & light tool for fetching known URLs from Wayback Machine, Common Crawl, Virus Total & AlienVault's

10 Aug 07, 2022

A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

1.5k Jan 04, 2023

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

1.1k Jan 06, 2023

Dailyiptvlist.com Scraper With Python

Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone

1 Oct 16, 2021

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

776 Jul 28, 2021

🥫 The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

692 Dec 22, 2022