How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

Related tags

Geolocationcog_how_to
Overview

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

According to Cogeo.org:

A Cloud Opdtimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

Think about the following case: You want to analyze the NDVI of your local 1km² park by using Sentinel 2 geoTIFF imaginery. Sentinel 2 satellite images cover very big regions. In the past, you had to download the whole file (100mb +) for band 4 (red) and the whole file for band 8 (near infrared) even that in fact, you need only a small portion of the data. That's why COG's (cloud optimized geoTIFFs) have been invented. With them, we ask the server to only send specific bytes of the image.

Cloud optimized geoTIFFs offer:

  • efficient imaginery data access
  • reduced duplication of data
  • legacy compatibility

COG's can be read just like normal geoTIFFs. In our example, we will use an AOI (area of interest), that is described in a geoJSON. We will also use sat-search to query the latest available Sentinel-2 satellite imaginery for our specific location. Then we will use Rasterio to perform a range request to download only the parts of the files we need. We will also use Pyproj to perform neccessary coordinate transformations. The cloud optimized Sentinel 2 imaginery is hosted in a AWS S3 repository.

Install libraries (matplotlib optional)

pip install rasterio pyproj sat-search matplotlib

Import libraries

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

First, we need to open our geoJSON file and extract the geometry. To create a geoJSON, you can go to geojson.io. Do not make a very large geoJSON (a good size is 1x1km²), otherwise you might get an error later.

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

We will query for images not older than 60 days that contain less than 20% clouds.

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

Now we got the URLs of the most recent Sentinel 2 imaginery for our region. In the next step, we need to calculate which pixels to query from our geoTIFF server. The satellite image comes with 10980 x 10980 pixels. Every pixel represents 10 meter ground resolution. In order to calculate which pixels fall into our area of interest, we need to reproject our geoJSON coordinates into pixel row/col. With the recent Rasterio versions, we can read COGs by passing a rasterio.windows.Window (that specifies which row/col to query) to the read function. Before we can query, we need to open a virtual file(urls of a hosted file):

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:

Then, we calculate the bounding box around our geometry and use the pyproj.Transformer to transform our geoJSON coordinates (EPSG 4326) into Sentinel Sat's EPSG 32633 projection.

        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 

Now that we have the right coordinates, we can calculate from coordinates to pixels in our geoTIFF file using rasterio.

        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()

Now we are ready for the desired range request.

        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

The subset object contains the desired data. We can access and vizualize it with:

        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()

red nir

I hope, I was able to show you how COG's work and that you are ready now to access your cloud optimized geoTIFF images in seconds compared to minutes in the past. Have a great day!

All together:

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:
        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 
        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()
        
        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

        # vizualize
        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()
        plt.show()
Owner
Marvin Gabler
specialized in climate, data & risk | interested in nature, rockets and outer space | The earth's data for our world's future
Marvin Gabler
ArcGIS Python Toolbox for WhiteboxTools

WhiteboxTools-ArcGIS ArcGIS Python Toolbox for WhiteboxTools. This repository is related to the ArcGIS Python Toolbox for WhiteboxTools, which is an A

Qiusheng Wu 190 Dec 30, 2022
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

Qiusheng Wu 1.4k Jan 02, 2023
FDTD simulator that generates s-parameters from OFF geometry files using a GPU

Emport Overview This repo provides a FDTD (Finite Differences Time Domain) simulator called emport for solving RF circuits. Emport outputs its simulat

4 Dec 15, 2022
Summary statistics of geospatial raster datasets based on vector geometries.

rasterstats rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal stat

Matthew Perry 437 Dec 23, 2022
Map Ookla server locations as a Kernel Density Estimation (KDE) geographic map plot.

Ookla Server KDE Plotting This notebook was created to map Ookla server locations as a Kernel Density Estimation (KDE) geographic map plot. Currently,

Jonathan Lo 1 Feb 12, 2022
User friendly Rasterio plugin to read raster datasets.

rio-tiler User friendly Rasterio plugin to read raster datasets. Documentation: https://cogeotiff.github.io/rio-tiler/ Source Code: https://github.com

372 Dec 23, 2022
Streamlit Component for rendering Folium maps

streamlit-folium This Streamlit Component is a work-in-progress to determine what functionality is desirable for a Folium and Streamlit integration. C

Randy Zwitch 224 Dec 30, 2022
Starlite-tile38 - Showcase using Tile38 via pyle38 in a Starlite application

Starlite-Tile38 Showcase using Tile38 via pyle38 in a Starlite application. Repo

Ben 8 Aug 07, 2022
Documentation and samples for ArcGIS API for Python

ArcGIS API for Python ArcGIS API for Python is a Python library for working with maps and geospatial data, powered by web GIS. It provides simple and

Esri 1.4k Dec 30, 2022
Blender addons to make the bridge between Blender and geographic data

Blender GIS Blender minimal version : 2.8 Mac users warning : currently the addon does not work on Mac with Blender 2.80 to 2.82. Please do not report

5.9k Jan 02, 2023
This program analizes films database with adresses, and creates a folium map with closest films to the coordinates

Films-map-project UCU CS lab 1.2, 1st year This program analizes films database with adresses, and creates a folium map with closest films to the coor

Artem Moskovets 1 Feb 09, 2022
QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios

QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios, that I wrote in Python 3 for my own research and visualisation. It is currently unfinis

Samuel Low 9 Aug 23, 2022
PySAL: Python Spatial Analysis Library Meta-Package

Python Spatial Analysis Library PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with

Python Spatial Analysis Library 1.1k Dec 18, 2022
Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Wei Ji 20 Nov 24, 2022
The geospatial toolkit for redistricting data.

maup maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts

Metric Geometry and Gerrymandering Group 60 Dec 05, 2022
Python library to decrypt Airtag reports, as well as a InfluxDB/Grafana self-hosted dashboard example

Openhaystack-python This python daemon will allow you to gather your Openhaystack-based airtag reports and display them on a Grafana dashboard. You ca

Bezmenov Denys 19 Jan 03, 2023
Computer Vision in Python

Mahotas Python Computer Vision Library Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy

Luis Pedro Coelho 792 Dec 20, 2022
Ingest and query genomic intervals from multiple BED files

Ingest and query genomic intervals from multiple BED files.

4 May 29, 2021
Color correction plugin for rasterio

rio-color A rasterio plugin for applying basic color-oriented image operations to geospatial rasters. Goals No heavy dependencies: rio-color is purpos

Mapbox 111 Nov 15, 2022
h3-js provides a JavaScript version of H3, a hexagon-based geospatial indexing system.

h3-js The h3-js library provides a pure-JavaScript version of the H3 Core Library, a hexagon-based geographic grid system. It can be used either in No

Uber Open Source 648 Jan 07, 2023