A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Last update: Feb 10, 2022

Overview

Parallel web scraping

The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Goal

The script extracts names and prices of the Top-100 crypto coins and stores the data into a db.

Disclaimer

The task is quite contrived and serves mainly for study purpose. There are innumerous of mature sources containing both real-time and historical cryptocurrency data.

Solved problems within the project

Multiple pages with one level nesting have been scraped. The propagation has been implemented by gathering internal links from the main page followed by looping on them.
To avoid getting banned from the remote server, a mechanism dealing with proxy servers was implemented.
A free public proxy server is commonly assumed as unreliable in terms of availability. To overcome this issue:
- another scraping script extracts a list of free public proxy servers from a web site.
- with each launch of the script, the list of 10 proxy servers gets updated by currently available proxy servers.
- during the script execution, some proxy servers get unavailable. Thus, each scraping query goes through this list and searches for an alive proxy server to execute a query.
To speed up the scraping of the total 101 web pages multithreading is involved. The work is divided among 4 threads running almost simultaneously.
The extracted data is being written directly to a DataBase.

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Related tags

Overview

Parallel web scraping

Goal

Disclaimer

Solved problems within the project

Owner

Kushal Shingote

12306抢票脚本

Meme-videos - Scrapes memes and turn them into a video compilations

An Web Scraping API for MDL(My Drama List) for Python.

A list of Python Bots used to extract data from several websites

Nekopoi scraper using python3

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

The core packages of security analyzer web crawler

Google Scholar Web Scraping

Divar.ir Ads scrapper

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

A high-level distributed crawling framework.

Python script for crawling ResearchGate.net papers✨⭐️📎

A tool to easily scrape youtube data using the Google API

Displays market info for the LUNI token on the Terra Blockchain

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

Scrapes proxies and saves them to a text file

A web crawler for recording posts in "sina weibo"