a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Web Crawlingscrapy-x
Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

  • cd to TestCrawler
  • run scrapy x
  • that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

Owner
Speakol Ads
Speakol Ads
This is my CS 20 final assesment.

eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

1 Jan 17, 2022
Jobinja.ir jobs scraper.

Jobinja.ir Dataset Introduction This project is a simple web scraper that scraps pages of jobinja.ir concurrently and writes and update (if file gets

Iman Kermani 3 Apr 15, 2022
Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Twitter Turbo / Auto Claimer / Swapper Version: 1.0 Last Update: 01/26/2022 Use this at your own descretion. I've only used this on test accounts and

Underscores 6 May 02, 2022
SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

28 Dec 20, 2022
A python module to parse the Open Graph Protocol

OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/ Installation $ pip in

Erik Rivera 213 Nov 12, 2022
A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

8 Sep 20, 2021
This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

- Hello, This Project Contains Amazon Web-bot. - I've developed this bot for fething some items information on Amazon. - Scrapy Framework in Python is

Khaled Tofailieh 4 Feb 13, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021
Google Developer Profile Badge Scraper

Google Developer Profile Badge Scraper It is a Google Developer Profile Web Scraper which scrapes for specific badges in a user's Google Developer Pro

Hemant Sachdeva 2 Feb 22, 2022
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
Displays market info for the LUNI token on the Terra Blockchain

LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

0 Jan 22, 2022
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Hanh Pham Van 0 Jan 06, 2022
京东抢茅台,秒杀成功很多次讨论,天猫抢购,赚钱交流等。

Jd_Seckill 特别声明: 请添加个人微信:19972009719 进群交流讨论 目前群里很多人抢到【扫描微信添加群就好,满200关闭群,有喜欢薅信用卡羊毛的也可以找我交流】 本仓库发布的jd_seckill项目中涉及的任何脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性

50 Jan 05, 2023
Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Pyrics Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes. ./test/run.py provides the full function in terminal cmd

MisterDK 1 Feb 12, 2022
Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

Louie Cai 13 Oct 15, 2022
The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade

Los Angeles Times Data and Graphics Department 51 Dec 14, 2022
A tool can scrape product in aliexpress: Title, Price, and URL Product.

Scrape-Product-Aliexpress A tool can scrape product in aliexpress: Title, Price, and URL Product. Usage: 1. Install Python 3.8 3.9 padahal halaman ins

Rahul Joshua Damanik 1 Dec 30, 2021
Simply scrape / download all the media from an fansly account.

Simply scrape / download all the media from an fansly account. Providing updates as long as its continuously gaining popularity, so hit the ⭐ button!

Mika C. 334 Jan 01, 2023
Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

2 Nov 01, 2021
Scraping followers of an instagram account

ScrapInsta A script to scraping data from Instagram Install First of all you can run: pip install scrapinsta After that you need to install these requ

Matheus Kolln 1 Sep 05, 2021