Geocode rows in a SQLite database table

Overview

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatum geocoder.

geocode-sqlite nominatum data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatum, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • nominatum

More will be added soon.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Python API

The command line interface aims to support the most common options for each geocoder. For more find-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatum

# create a geocoder instance, with some extra options
nominatum = Nominatum(user_agent="this-is-me", domain="nominatum.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Comments
  • fixing

    fixing "ImportError" in the 6 lines and 7

    hi Chris Amico. I Using GNU/Linux os And in GNU/Linux, I can not run the cli.py file. But with a small change I made, this problem was solved. If you like, please merging.

    opened by EmadDeve20 4
  • Add OpenCage option to CLI

    Add OpenCage option to CLI

    I'm a contractor for OpenCage geocoder. We'd like to add an opencage option to the CLI. OpenCage is already available in geopy so I have just copied the existing geocoder code with the relevant options.

    • Added OpenCage as an option to CLI
    • Added OpenCage to README
    • Added OpenCage test to Makefile
    opened by sbscully 2
  • Better argument ergonomics

    Better argument ergonomics

    Looking at this with fresh eyes, this whole bit is dumb:

    Note the order of options: There are two sets of options we need to pass.

    The first concerns the data we're geocoding. We need to say where our database is and what table we're using, and optionally, how to extract a location query.

    Then, we need to say what geocoder we're using, and pass in any options needed to initalize it. This will be different for each geocoder we want to use.

    Ideally, it should be something like this:

    geocode-sqlite nominatum  data.db data \
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me"
    

    Maybe this would be ok:

    geocode-sqlite
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me" \
    nominatum data.db data
    

    But the first way is definitely better.

    opened by eyeseast 2
  • Spatialite helper?

    Spatialite helper?

    Would it be worth helping convert lat/lng fields to a spatial index? https://docs.datasette.io/en/stable/spatialite.html#spatial-indexing-latitude-longitude-columns

    Here's the whole code block:

    import sqlite3
    conn = sqlite3.connect('museums.db')
    # Lead the spatialite extension:
    conn.enable_load_extension(True)
    conn.load_extension('/usr/local/lib/mod_spatialite.dylib')
    # Initialize spatial metadata for this database:
    conn.execute('select InitSpatialMetadata(1)')
    # Add a geometry column called point_geom to our museums table:
    conn.execute("SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);")
    # Now update that geometry column with the lat/lon points
    conn.execute('''
        UPDATE events SET
        point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326);
    ''')
    # Now add a spatial index to that column
    conn.execute('select CreateSpatialIndex("museums", "point_geom");')
    # If you don't commit your changes will not be persisted:
    conn.commit()
    conn.close()
    

    I don't know if it belongs in this package or not.

    question 
    opened by eyeseast 2
  • Use rate limiting by default?

    Use rate limiting by default?

    It's generally bad practice to throw a few hundred geocoding requests at a free service. Given that, is it worth setting a default delay of one second, and let people explicitly turn it off if needed?

    opened by eyeseast 2
  • Geocoder CLI supprt

    Geocoder CLI supprt

    Here are the geocoders currently supported by the CLI. I want to add the most common ones (which I use) first:

    • [x] Google
    • [x] MapBox
    • [x] OpenStreetMap
    • [x] MapQuest
    • [x] Bing
    • [ ] Geocode.Earth
    • [ ] Geocodio
    • [ ] Pelias
    • [ ] MapTiler

    What else?

    help wanted good first issue 
    opened by eyeseast 2
  • fix failing tests that exepect an envvar

    fix failing tests that exepect an envvar

    if MAPBOX_API_KEY is not set in the environment, test_pass_kwargs fails.

    noticed this when trying to run tests locally. the D prefix defers to values already set in the environment.

    you may not want to introduce a new dependency to your project (namely pytest-env), and I would understand that, but just thought I'd bring this to your attention with a suggested fix. take it or leave it! this is a very useful library. thanks!

    opened by noslouch 1
  • GeoJSON and SpatiaLite support

    GeoJSON and SpatiaLite support

    Closes #22 Closes #24 Closes #26

    Passing a --geojson flag will store results as a GeoJSON geometry, instead of in latitude and longitude columns.

    Using --spatialite will add a geometry column and store results as a SpatiaLite binary.

    This should make it easier to work with datasette-geojson and datasette-geojson-map.

    opened by eyeseast 1
  • Save results as geojson?

    Save results as geojson?

    Right now, results are saved in two columns: latitude and longitude.

    Now that datasette-geojson and datasette-geojson-map exist, it would be nice to have this plugin feed into those.

    Maybe it's an option: sqlite-geocode data.db table ... --geojson

    That would be backwards compatible at least.

    question 
    opened by eyeseast 1
  • Capture full geocoding results?

    Capture full geocoding results?

    opened by eyeseast 1
  • Need a progress bar

    Need a progress bar

    Especially when rate limiting, which is now the default, geocoding a table is slow. With a one-second delay, a 300 row table takes five minutes to finish, at best.

    opened by eyeseast 1
  • Consolidate code to geocode a list

    Consolidate code to geocode a list

    I do this in three places:

    • In geocode_table, which I wrote first
    • In the CLI
    • In geocode_list, which I needed for the progress bar

    This whole block is basically repeated: https://github.com/eyeseast/geocode-sqlite/blob/main/geocode_sqlite/utils.py#L50-L68. Part of the issue is that in the Python API, I'm using log.info and in the CLI I'm using click.echo. Maybe I can abstract that and pass in a print function.

    opened by eyeseast 0
  • Async support

    Async support

    Geopy has a set of tools to help with async here: https://geopy.readthedocs.io/en/latest/#async-mode

    Should definitely use those, especially when thinking about Datasette integration.

    opened by eyeseast 2
Releases(v0.8.2)
  • v0.8.2(Nov 7, 2022)

    What's Changed

    • Require requests. More forgiving timeout. by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/43

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 6, 2022)

    What's Changed

    • Raw results in geocode_table, too by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/42

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 6, 2022)

    What's Changed

    • Just a typo! by @shaver in https://github.com/eyeseast/geocode-sqlite/pull/40
    • Capture raw geocoding results by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/41

    New Contributors

    • @shaver made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/40

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.7.0...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 31, 2022)

    The OpenCage geocoding service is now supported. Thanks @sbscully. This library is now tested on Python 3.11.

    What's Changed

    • Add OpenCage option to CLI by @sbscully in https://github.com/eyeseast/geocode-sqlite/pull/39

    New Contributors

    • @sbscully made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/39

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 16, 2022)

    What's Changed

    • if no bbox options are passed, value is None by @noslouch in https://github.com/eyeseast/geocode-sqlite/pull/30
    • Pass --api-key to mapbox test by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/33

    New Contributors

    • @noslouch made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/30

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2022)

    What's Changed

    • GeoJSON and SpatiaLite support by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/23

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.2...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 1, 2022)

    What's Changed

    • Require click >= 7.0 to hide commands by @1-Byte in https://github.com/eyeseast/geocode-sqlite/pull/20
    • Fix error with rowid tables by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/27

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.1...v0.5.2

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 20, 2021)

  • v0.3.0(Apr 4, 2021)

  • v0.2.0(Apr 3, 2021)

    Arguments and options now follow a much more sensible order, following the example of other thing-to-sqlite tools. For example:

    geocode-sqlite nominatum geo.db innout_test \
    	--location "{full}, {city}, {state} {postcode}" \
    	--delay 1 \
    	--user-agent "geocode-sqlite"
    

    The order is now:

    1. geocode-sqlite
    2. geocoder, such as nominatum
    3. database path
    4. table name
    5. any additional options
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 8, 2020)

Owner
Chris Amico
Journalist / Developer in Boston
Chris Amico
Asynchronous Client for the worlds fastest in-memory geo-database Tile38

This is an asynchonous Python client for Tile38 that allows for fast and easy interaction with the worlds fastest in-memory geodatabase Tile38.

Ben 53 Dec 29, 2022
Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method

Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc

Mostafa Farrag 15 Dec 26, 2022
This repository contains the scripts to derivate the ENU and ECEF coordinates from the longitude, latitude, and altitude values encoded in the NAD83 coordinates.

This repository contains the scripts to derivate the ENU and ECEF coordinates from the longitude, latitude, and altitude values encoded in the NAD83 coordinates.

Luigi Cruz 1 Feb 07, 2022
Calculate & view the trajectory and live position of any earth-orbiting satellite

satellite-visualization A cross-platform application to calculate & view the trajectory and live position of any earth-orbiting satellite in 3D. This

Space Technology and Astronomy Cell - Open Source Society 3 Jan 08, 2022
A Django application that provides country choices for use with forms, flag icons static files, and a country field for models.

Django Countries A Django application that provides country choices for use with forms, flag icons static files, and a country field for models. Insta

Chris Beaven 1.2k Jan 03, 2023
Minimum Bounding Box of Geospatial data

BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in

Ali Khosravi Kazazi 1 Sep 08, 2022
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 765 Jan 06, 2023
Mmdb-server - An open source fast API server to lookup IP addresses for their geographic location

mmdb-server mmdb-server is an open source fast API server to lookup IP addresses

Alexandre Dulaunoy 67 Nov 25, 2022
Simple, concise geographical visualization in Python

Geographic visualizations for HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it? GeoViews is a Python library that ma

HoloViz 445 Jan 02, 2023
Software for Advanced Spatial Econometrics

GeoDaSpace Software for Advanced Spatial Econometrics GeoDaSpace current version 1.0 (32-bit) Development environment: Mac OSX 10.5.x (32-bit) wxPytho

GeoDa Center 38 Jan 03, 2023
Python bindings to libpostal for fast international address parsing/normalization

pypostal These are the official Python bindings to https://github.com/openvenues/libpostal, a fast statistical parser/normalizer for street addresses

openvenues 651 Dec 16, 2022
Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

django-rest-framework-gis Geographic add-ons for Django Rest Framework - Mailing List. Install last stable version from pypi pip install djangorestfra

OpenWISP 981 Jan 03, 2023
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
Raster processing benchmarks for Python and R packages

Raster processing benchmarks This repository contains a collection of raster processing benchmarks for Python and R packages. The tests cover the most

Krzysztof Dyba 13 Oct 24, 2022
Introduction to Geospatial Analysis in Python

Introduction to Geospatial Analysis in Python This repository is in support of a talk on geospatial data. Data To recreate all of the examples, the da

Dillon Gardner 6 Oct 19, 2022
A service to auto provision devices in Aruba Central based on the Geo-IP location

Location Based Provisioning Service for Aruba Central A service to auto provision devices in Aruba Central based on the Geo-IP location Geo-IP auto pr

Will Smith 3 Mar 22, 2022
🌐 Local tile server for viewing geospatial raster files with ipyleaflet or folium

🌐 Local Tile Server for Geospatial Rasters Need to visualize a rather large (gigabytes) raster you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 04, 2023
Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

OpenWISP 982 Jan 06, 2023
Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology 11 Mar 02, 2022
geemap - A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets.

A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and folium

Qiusheng Wu 2.4k Dec 30, 2022