A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Web Crawlinggypsylist
Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
๐Ÿ™๏ธ  city: Lisbon
๐ŸŒŽ country: Portugal
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 4/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

...

#440
๐Ÿ™๏ธ  city: Zurich
๐ŸŒŽ country: Switzerland
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#441
๐Ÿ™๏ธ  city: Leiden
๐ŸŒŽ country: Netherlands
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#442
๐Ÿ™๏ธ  city: Honolulu, Hawaii
๐ŸŒŽ country: United States
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

#443
๐Ÿ™๏ธ  city: Lake Tahoe, CA
๐ŸŒŽ country: United States
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

  • Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.
Owner
Alessio Greggi
Computer Scientist graduated at the University of Rome, Tor Vergata. Currently working as Linux Engineer. CTF Player during free time.
Alessio Greggi
Scrap the 42 Intranet's elearning videos in a single click

42intra_scraper Scrap the 42 Intranet's elearning videos in a single click. Why you would want to use it ? Adjust speed at your convenience. (The intr

Noufel 5 Oct 27, 2022
A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

Isaac Muse 151 Dec 23, 2022
Luis M. Capdevielle 1 Jan 14, 2022
็ˆฌ่™ซๆกˆไพ‹ๅˆ้›†ใ€‚ๅŒ…ๆ‹ฌไฝ†ไธ้™ไบŽใ€Šๆท˜ๅฎใ€ไบฌไธœใ€ๅคฉ็Œซใ€่ฑ†็“ฃใ€ๆŠ–้Ÿณใ€ๅฟซๆ‰‹ใ€ๅพฎๅšใ€ๅพฎไฟกใ€้˜ฟ้‡Œใ€ๅคดๆกใ€pddใ€ไผ˜้…ทใ€็ˆฑๅฅ‡่‰บใ€ๆบ็จ‹ใ€12306ใ€58ใ€ๆœ็‹ใ€็™พๅบฆๆŒ‡ๆ•ฐใ€็ปดๆ™ฎไธ‡ๆ–นใ€Zlibratyใ€Oalibใ€ๅฐ่ฏดใ€ๆ‹›ๆ ‡็ฝ‘ใ€้‡‡่ดญ็ฝ‘ใ€ๅฐ็บขไนฆใ€‹

lxSpider ็ˆฌ่™ซๆกˆไพ‹ๅˆ้›†ใ€‚ๅŒ…ๆ‹ฌไฝ†ไธ้™ไบŽใ€Šๆท˜ๅฎใ€ไบฌไธœใ€ๅคฉ็Œซใ€่ฑ†็“ฃใ€ๆŠ–้Ÿณใ€ๅฟซๆ‰‹ใ€ๅพฎๅšใ€ๅพฎไฟกใ€้˜ฟ้‡Œใ€ๅคดๆกใ€pddใ€ไผ˜้…ทใ€็ˆฑๅฅ‡่‰บใ€ๆบ็จ‹ใ€12306ใ€58ใ€ๆœ็‹ใ€็™พๅบฆๆŒ‡ๆ•ฐใ€็ปดๆ™ฎไธ‡ๆ–นใ€Zlibratyใ€Oalibใ€ๅฐ่ฏด็ฝ‘็ซ™ใ€ๆ‹›ๆ ‡้‡‡่ดญ็ฝ‘ใ€‹ ็ฎ€ไป‹๏ผš ๆ—ถๅ…‰่่‹’๏ผŒ่ฎฐไธๆธ…ๅ†™ไบ†ๅคšๅฐ‘ๆกˆไพ‹ไบ†ใ€‚

lx 793 Jan 05, 2023
Automated data scraper for Thailand COVID-19 data

The Researcher COVID data Automated data scraper for Thailand COVID-19 data Accessing the Data 1st Dose Provincial Vaccination Data 2nd Dose Provincia

Porames Vatanaprasan 31 Apr 17, 2022
Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

John 12 Oct 22, 2022
Generate a repository with mirror links for DriveDroid app

DriveDroid Repository Generator Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone Check also an offi

Evgeny 11 Nov 19, 2022
Example of scraping a paginated API endpoint and dumping the data into a DB

Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i

Alex Skobelev 1 Oct 20, 2021
Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

NewsScraper A simple Python 3 module to get crypto or news articles and their content from various RSS feeds. ๐Ÿ”ง Installation Clone the repo locally.

Rokas 3 Jan 02, 2022
OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

1 Jan 28, 2022
Amazon scraper using scrapy, a python framework for crawling websites.

#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

Akash Das 1 Dec 26, 2021
FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

UserGhost411 1 Nov 17, 2022
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

๐Ÿ•ณ๏ธ CygnusX1 Code by Trong-Dat Ngo. Overviews ๐Ÿ•ณ๏ธ CygnusX1 is a multithreaded tool ๐Ÿ› ๏ธ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
Console application for downloading images from Reddit in Python

RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow

James 0 Jul 04, 2021
Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

91 Dec 08, 2022
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
New World Market Scraper

Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

4 Sep 21, 2022
A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

TDTV2-Direct Version 1.00.1 โ€ข A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com :) How to Works?? install all dependancies v

Danushka-Madushan 1 Nov 28, 2021
An automated, headless YouTube Watcher and Scraper

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube aut

44 Oct 18, 2022
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022