a high-performance, lightweight and human friendly serving engine for scrapy

Last update: Mar 01, 2022

Related tags

Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

cd to TestCrawler
run scrapy x
that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Overview

scrapy-x (X)

Installation

Usage

Default Settings

Available Endpoints

Technologies Used

Author

My Similar Projects

Owner

Speakol Ads

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

for those who dont want to pay $10/month for high school game footage with ads

Scrapy-based cyber security news finder

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

Scrapes all articles and their headlines from theonion.com

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

A simple app to scrap data from Twitter.

Web scraper build using python.

优化版本的京东茅台抢购神器

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

A way to scrape sports streams for use with Jellyfin.

A module for CME that spiders hashes across the domain with a given hash.

This script is intended to crawl license information of repositories through the GitHub API.

Pelican plugin that adds site search capability

Binance Smart Chain Contract Scraper + Contract Evaluator

Iptvcrawl - A scrapy project for crawl IPTV playlist

LSpider 一个为被动扫描器定制的前端爬虫

Ebay Webscraper for Getting Average Product Price