This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

Web and PDF Scraper Refactoring

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

A Web Scraping Program.

Web scraper for Zillow

Web Scraping Instagram photos with Selenium by only using a hashtag.

A tool for scraping and organizing data from NewsBank API searches

Script used to download data for stocks.

An experiment to deploy a serverless infrastructure for a scrapy project.

Telegram Group Scrapper

Grab the changelog from releases on Github

A modern CSS selector implementation for BeautifulSoup

Creating Scrapy scrapers via the Django admin interface

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Searching info from Google using Python Scrapy

This is a sport analytics project that combines the knowledge of OOP and Webscraping

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

抖音批量下载用户所有无水印视频

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

A Pixiv web crawler module

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具