Screen scraping and web crawling framework

Last update: Jun 21, 2021

Overview

Pomp

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

Pure python
Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
Extensible networking: you may use any sync or async method.
No parsing libraries in the core; use you preferred approach.
Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

redirects
proxies
caching
database integration
cookies
authentication
etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Screen scraping and web crawling framework

Related tags

Overview

Pomp

Owner

Evgeniy Tatarkin

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Scraping web pages to get data

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Binance Smart Chain Contract Scraper + Contract Evaluator

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Google Developer Profile Badge Scraper

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

🥫 The simple, fast, and modern web scraping library

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

Facebook Group Scraping Using Beautiful Soup & Selenium

Web Scraping OLX with Python and Bsoup.

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

LSpider 一个为被动扫描器定制的前端爬虫

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

An automated, headless YouTube Watcher and Scraper

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN