Crawler in Python 3.7, 3.8. 3.9. Pypy3

Last update: Mar 12, 2022

Overview

Description

Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8)

Installation and Use

Setup VirtualEnv

which python3 this will output the path of your python3
#now setup a python3 virtualenv
mkvirtualenv crawl3 -p $(which python3)

workon crawler
python main.py -d5 http://gotchacode.com // -d5 means crawl to the depth of 5.

Results:

And the output is:

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 29200.11it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22563.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 21375.28it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 22227.37it/s]
CRAWLER STARTED:
https://vinitkumar.me, will crawl upto depth 2
https://vinitkumar.me/
http://changer.nl
https://twitter.com/vinitkme
https://vinitkumar.me/about
https://vinitkumar.github.io/vinit_kumar.pdf
https://vinitkumar.me/values
https://github.com/vinitkumar
https://vinitkumar.me/2013-03-24-life-has-changed/
https://vinitkumar.me/2013-03-24-my-javascript-love/
https://vinitkumar.me/2013-03-27-twitter-like-app-in-nodejs/
http://twitter.com/vinitkme
https://vinitkumar.me/2013-04-07-first-flight-and-vacation-after-months/
====================================================================================================
Crawler Statistics
====================================================================================================
No of links Found: 12
No of followed:     3
Found all links after 0.54s

Issues

Create an issue here if you encounter a bug: create-issue

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

3 Oct 4, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

46 Dec 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具，可以快速批量下载大量论文，方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文，目前抓取成功率维持在90%以上。通过配置Config文件，可以抓取任意计算机领域相关会议的论文。 Installation Down

47 Nov 23, 2022

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

1 Dec 20, 2021

Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

2 Feb 17, 2022

Comments

Following things are done in this PR:
Code is modified to use async and await and use coroutines to run in parallel. It being a crawler makes sense to use async.

following steps were taken:

All the print statements are not replace with loggers.

Some methods are furthered refactored to enhance readability.

Version bumped.

The code is refactored that in case of error it fails early and fails fast.
opened by vinitkumar 0

Releases(v1.0.0)

v1.0.0(Apr 11, 2015)

This new release ports the pycrawler to have python3 support. Enjoy!
Source code(tar.gz)
Source code(zip)

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Related tags

Overview

Description

Installation and Use

Setup VirtualEnv

Results:

Issues

You might also like...

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

A Pixiv web crawler module

Google Maps crawler using Selenium

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

Create crawler get some new products with maximum discount in banimode website

Comments

Following things are done in this PR:

Releases(v1.0.0)

v1.0.0(Apr 11, 2015)

Owner

Vinit Kumar

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Async Python 3.6+ web scraping micro-framework based on asyncio

This tool crawls a list of websites and download all PDF and office documents

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

for those who dont want to pay $10/month for high school game footage with ads

Console application for downloading images from Reddit in Python

Simply scrape / download all the media from an fansly account.

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Get paper names from dblp.org

Python script for crawling ResearchGate.net papers✨⭐️📎

A Scrapper with python

This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

A dead simple crawler to get books information from Douban.

Incredibly fast crawler designed for OSINT.

Create crawler get some new products with maximum discount in banimode website

This project was created using Python technology and flask tools to scrape a music site

Examine.com supplement research scraper!

This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

Get-web-images - A python code that get images from any site