Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Last update: Dec 27, 2022

Overview

crawlersuseragents

This Python script can be used to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Features

30 crawler's user agent strings.
Multithreading.
JSON export with --json outputfile.json.
Auto-detecting responses that stands out.

Usage

$ ./crawlersuseragents.py -h
[~] Access web pages as web crawlers User-Agents, v1.1

usage: crawlersuseragents.py [-h] [-v] [-t THREADS] [-x PROXY] [-k] [-L] [-j JSONFILE] url

This Python script can be used to check if there is any differences in responses of an application
when the request comes from a search engine's crawler.

positional arguments:
  url                   e.g. https://example.com:port/path

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         arg1 help message
  -t THREADS, --threads THREADS
                        Number of threads (default: 5)
  -x PROXY, --proxy PROXY
                        Specify a proxy to use for requests (e.g., http://localhost:8080)
  -k, --insecure        Allow insecure server connections when using SSL (default: False)
  -L, --location        Follow redirects (default: False)
  -j JSONFILE, --jsonfile JSONFILE
                        Save results to specified JSON file.

Auto-detecting responses that stands out

Results are sorted by uniqueness of their response's length. This means that the results with unique response length will be on top, and results with response's length occurring multiple times at the bottom:

Two different result lengths	Four different result lengths

Contributing

Pull requests are welcome. Feel free to open an issue if you want to add other features.

References

You might also like...

Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

4 Dec 3, 2022

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

2 Jan 24, 2022

A crawler of doubamovie

豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用，选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法，我们对它进行重写。 def start_requests(self):

1 Oct 5, 2021

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

3 Oct 4, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

46 Dec 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Related tags

Overview

crawlersuseragents

Features

Usage

Auto-detecting responses that stands out

Contributing

References

You might also like...

Audio media crawler for lbry.

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

A crawler of doubamovie

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

A Pixiv web crawler module

Google Maps crawler using Selenium

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

Releases(1.1)

1.1(Nov 15, 2021)

Owner

Podalirius

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A python script to extract answers to any question on Quora (Quora+ included)

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

A pure-python HTML screen-scraping library

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

Download images from forum threads

Pro Football Reference Game Data Webscraper

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Simple proxy scraper made by using ProxyScrape's api.

Telegram group scraper tool

A simple django-rest-framework api using web scraping

Web3 Pancakeswap Sniper bot written in python3

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

A social networking service scraper in Python

Python scraper to check for earlier appointments in Clalit Health Services

Automated data scraper for Thailand COVID-19 data