Extract countries, regions and cities from a URL or text

Overview

This project is no longer being maintained and has been archived. Please check the Forks list for newer versions.

Forks

We are aware of two 3rd party forks for this library:

Geograpy

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Install & Setup

Grab the package using pip (this will take a few minutes)

pip install geograpy

Geograpy uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.

geograpy-nltk

Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

Now you have access to information about all the places mentioned in the linked article.

  • places.countries contains a list of country names
  • places.regions contains a list of region names
  • places.cities contains a list of city names
  • places.other lists everything that wasn't clearly a country, region or city

Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia").

But Wait, There's More

In addition to listing the names of discovered places, you'll also get some information about the relationships between places.

  • places.country_regions regions broken down by country
  • places.country_cities cities broken down by country
  • places.address_strings city, region, country strings useful for geocoding

Last But Not Least

While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.

  • places.country_mentions
  • places.region_mentions
  • places.city_mentions

Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  

If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
print e.places

Place context is handled in the places module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print pc.regions #['Ohio']

pc.set_cities()
print pc.cities #['Cleveland']

print pc.address_strings #['Cleveland, Ohio, United States']

And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.

Credits

Geograpy uses the following excellent libraries:

Geograpy uses the following data sources:

Hat tip to Chris Albon for the name.

Owner
Ushahidi
Building open sourced software to change the flow of information in the world.
Ushahidi
Qysqa - URL shortener website with python

Qysqa - shorten your URL. ~ A simple URL-shortening website. how do you pronounc

Dastan Ozgeldi 0 Nov 18, 2022
A tool to manage the base URL of the Python package index.

chpip A tool to manage the base URL of the Python package index. Installation $ pip install chpip Usage Set pip index URL Set the base URL of the Pyth

Prodesire 4 Dec 20, 2022
Python implementation for generating Tiny URL- and bit.ly-like URLs.

Short URL Generator Python implementation for generating Tiny URL- and bit.ly-like URLs. A bit-shuffling approach is used to avoid generating consecut

Alireza Savand 170 Dec 28, 2022
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
πŸ”— Generate Phishing URLs πŸ”—

URLer πŸ”— Generate Phishing URLs πŸ”— URLer Table Of Contents General Information Preview Installation Disclaimer Credits Social Media Bug Report General

mrblackx 5 Feb 08, 2022
encurtador de links feito com python

curt-link encurtador de links feito com python! instalaΓ§Γ£o Linux: $ git clone https://github.com/bydeathlxncer/curt-link $ cd curt-link $ python3 url.

bydeathlxncer 5 Dec 29, 2021
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

13 Apr 14, 2022
🌐 URL parsing and manipulation made easy.

furl is a small Python library that makes parsing and manipulating URLs easy. Python's standard urllib and urlparse modules provide a number of URL re

Ansgar Grunseid 2.4k Jan 04, 2023
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

3 Aug 24, 2021
find all the URL of a site with a specific Regex

href this program will find all the link with a spesfic Regex pattern from a site. what it will do in any site there are a lots of url that may you ne

Arya Shabane 12 Dec 05, 2022
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
A simple, immutable URL class with a clean API for interrogation and manipulation.

purl - A simple Python URL class A simple, immutable URL class with a clean API for interrogation and manipulation. Supports Pythons 2.7, 3.3, 3.4, 3.

David Winterbottom 286 Jan 02, 2023
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
Fast pattern fetcher, Takes a URLs list and outputs the URLs which contains the parameters according to the specified pattern.

Fast Pattern Fetcher (fpf) Coded with 3 by HS Devansh Raghav Fast Pattern Fetcher, Takes a URLs list and outputs the URLs which contains the paramete

whoami security 5 Feb 20, 2022
A URL builder for genius :D

genius-url A URL builder for genius :D Usage from gurl import genius_url

κŒ—α–˜κ’’κ€€κ“„κ’’κ€€κˆ€κŸ 12 Aug 14, 2021
URL Shortener in Flask - Web service using Flask framework for Shortener URLs

URL Shortener in Flask Web service using Flask framework for Shortener URLs Install Create Virtual env $ python3 -m venv env Install requirements.txt

Rafnix Guzman 1 Sep 21, 2021
A python code for url redirect check

A python code for url redirect check

Fayas Noushad 1 Oct 24, 2021
A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

webargs Homepage: https://webargs.readthedocs.io/ webargs is a Python library for parsing and validating HTTP request objects, with built-in support f

marshmallow-code 1.3k Jan 01, 2023
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

2 Nov 23, 2021
Astra is a tool to find URLs and secrets.

Astra finds urls, endpoints, aws buckets, api keys, tokens, etc from a given url/s. It combines the paths and endpoints with the given domain and give

Stinger 198 Dec 27, 2022