A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Web Crawlinggypsylist
Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
๐Ÿ™๏ธ  city: Lisbon
๐ŸŒŽ country: Portugal
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 4/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

...

#440
๐Ÿ™๏ธ  city: Zurich
๐ŸŒŽ country: Switzerland
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#441
๐Ÿ™๏ธ  city: Leiden
๐ŸŒŽ country: Netherlands
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#442
๐Ÿ™๏ธ  city: Honolulu, Hawaii
๐ŸŒŽ country: United States
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

#443
๐Ÿ™๏ธ  city: Lake Tahoe, CA
๐ŸŒŽ country: United States
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

  • Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.
Owner
Alessio Greggi
Computer Scientist graduated at the University of Rome, Tor Vergata. Currently working as Linux Engineer. CTF Player during free time.
Alessio Greggi
a way to scrape a database of all of the isef projects

ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

William Kaiser 1 Mar 18, 2022
ไบฌไธœ็ง’ๆ€ๅ•†ๅ“ๆŠข่ดญPython่„šๆœฌ

Jd_Seckill ้žๅธธๆ„Ÿ่ฐขๅŽŸไฝœ่€… https://github.com/zhou-xiaojun/jd_mask ๆไพ›็š„ไปฃ็  ไนŸ้žๅธธๆ„Ÿ่ฐข https://github.com/wlwwu/jd_maotai ่ฟ›่กŒ็š„ไผ˜ๅŒ– ไธป่ฆๅŠŸ่ƒฝ ็™ป้™†ไบฌไธœๅ•†ๅŸŽ๏ผˆwww.jd.com๏ผ‰ cookies็™ปๅฝ• (้œ€่ฆ่‡ช

Andy Zou 1.5k Jan 03, 2023
๐Ÿ‘๏ธ Tool for Data Extraction and Web Requests.

httpmapper ๐Ÿ‘๏ธ Project โ€ข Technologies โ€ข Installation โ€ข How it works โ€ข License Project ๐Ÿšง For educational purposes. This is a project that I developed,

15 Dec 05, 2021
A scalable frontier for web crawlers

Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

Scrapinghub 1.2k Jan 02, 2023
simple http & https proxy scraper and checker

simple http & https proxy scraper and checker

Neospace 11 Nov 15, 2021
Binance Smart Chain Contract Scraper + Contract Evaluator

Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

14 Dec 09, 2022
A web service for scanning media hosted by a Matrix media repository

Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

Brendan Abolivier 5 Dec 01, 2022
This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

Devansh Singh 1 Feb 10, 2022
Python web scrapper

Website scrapper Web scrapping project in Python. Created for learning purposes. Start Install python Update configuration with websites Launch script

Nogueira Vitor 1 Dec 19, 2021
Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

John 12 Oct 22, 2022
Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc).

Amit 6 Aug 26, 2022
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022
NASA APOD Discord Bot - Fetches information from NASA APOD site.

NASA APOD Discord Bot - Fetches information from NASA APOD site.

Astronomy Club IITK 4 Apr 23, 2022
AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

5 Nov 25, 2021
Google Developer Profile Badge Scraper

Google Developer Profile Badge Scraper GDev Profile Badge Scraper is a Google Developer Profile Web Scraper which scrapes for specific badges in a use

Siddhant Lad 7 Jan 10, 2022
An Web Scraping API for MDL(My Drama List) for Python.

PyMDL An API for MyDramaList(MDL) based on webscraping for python. Description An API for MDL to make your life easier in retriving and working on dat

6 Dec 10, 2022
The first public repository that provides free BUBT website scraping API script on Github.

BUBT WEBSITE SCRAPPING SCRIPT I think this is the first public repository that provides free BUBT website scraping API script on github. When I was do

Md Imam Hossain 3 Feb 10, 2022
Python framework to scrape Pastebin pastes and analyze them

pastepwn - Paste-Scraping Python Framework Pastebin is a very helpful tool to store or rather share ascii encoded data online. In the world of OSINT,

Rico 105 Dec 29, 2022
SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

28 Dec 20, 2022
Simple tool to scrape and download cross country ski timings and results from live.skidor.com

LiveSkidorDownload Simple tool to scrape and download cross country ski timings

0 Jan 07, 2022