Python wrapper for Synoptic Data API. Retrieve data from thousands of mesonet stations and networks. Returns JSON from Synoptic as Pandas DataFrame

Overview

โ˜ Synoptic API for Python (unofficial)

DOI

The Synoptic Mesonet API (formerly MesoWest) gives you access to real-time and historical surface-based weather and environmental observations for thousands of stations.

๐Ÿ“” SynopticPy Documentation

Synoptic is free up to 5,000 API requests and 5 million service units each month. That's a generous amount, but if you need even more data, a paid tier is available (through Synoptic, not me).

๐ŸŒ Register for a free account at the Synoptic API Webpage

https://developers.synopticdata.com

You will need to obtain an API token before using this python package.

I wrote these functions to conveniently access data from the Synoptic API and convert the JSON data to a Pandas DataFrame. This may be helpful to others who are getting started with the Synoptic API and Python. The idea is loosely based on the obsolete MesoPy python wrapper, but returning the data as a Pandas DataFrame instead of a simple dictionary, making the retrieved data more ready-to-use.

Contributing Guidelines (and disclaimer)

Since this package is a work in progress, it is distributed "as is." I do not make any guarantee it will work for you out of the box. In fact, this is my first experience publishing a package to PyPI. Any revisions I make are purely for my benefit. Sorry if I break something, but I usually only push updates to GitHub if the code is in a reasonably functional state (at least, in the way I use it).

With that said, I am happy to share this project with you. You are welcome to open issues and submit pull requests, but know that I may or may not get around to doing anything about it. If this is helpful to you in any way, I'm glad.


๐Ÿ Installation and Conda Environment

Option 1: pip

Install the last published version from PyPI. This requires the following are already installed:
numpy, pandas, requests. It's optional, but you will want matplotlib, and cartopy, too.

pip install SynopticPy

Option 2: conda

If conda environments are new to you, I suggest you become familiar with managing conda environments.

I have provided a sample Anaconda environment.yml file that lists the minimum packages required plus some extras that might be useful when working with other types of weather data. Look at the bottom lines of that yaml file...there are two ways to install SynopticPy with pip. Comment out the lines you don't want.

For the latest development code:

- pip:
    - git+https://github.com/blaylockbk/SynopticPy.git

For the latest published version

- pip:
    - SynopticPy

First, create the virtual environment with

conda env create -f environment.yml

Then, activate the synoptic environment. Don't confuse this environment name with the package name.

conda activate synoptic

Occasionally, you might want to update all the packages in the environment.

conda env update -f environment.yml

Alternative "Install" Method

There are several other ways to "install" a python package so you can import them. One alternatively is you can git clone https://github.com/blaylockbk/SynopticPy.git this repository to any directory. To import the package, you will need to update your PYTHONPATH environment variable to find the directory you put this package or add the line sys.path.append("/path/to/SynotpicPy") at the top of your python script.

๐Ÿ”จ Setup

Before you can retrieve data from the Synoptic API, you need to register as a Synoptic user and obtain a token. Follow the instructions at the Getting Started Page. When you have a token, edit synoptic/config.cfg with your personal API token, not your API key. The config file should look should look something like this:

[Synoptic]
token = 1234567890abcdefg

If you don't do this step, don't worry. When you import synoptic.services, a quick check will make sure the token in the config file is valid. If not, you will be prompted to update the token in the config file.

๐Ÿ“ Jupyter Notebook Examples

I have provided a number of notebooks on GitHub that contain some practical examples for importing and using these functions to get and show station data.



synoptic/

๐ŸŽŸ get_token.py

This function performs a test on the token in the config.cfg file. If the token is valid, you are good to go. If not, then you will be prompted to edit the config.cfg file when you import any function from synoptic.services.

๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป services.py

This is the main module you will interact with. It contains functions for making API requests and returns the data as a Pandas DataFrame.

# Import all functions
import synoptic.services as ss

or

# Import a single function (prefered)
from synotpic.services import stations_timeseries

Available Functions

There is a separate function for each of the Synoptic Mesonet API services as described in the Synotpic documentation.

  1. synoptic_api - A generalized wrapper for making an API request and returns a requests object. You could access the raw JSON from this object, but the other functions will convert that JSON to a Pandas DataFrame. Generally, you won't use this function directly. The primary role of this function is to format parameter arguments to a string the request URL needs to retrieve data.

    • Converts datetime input to a string
      • datetime(2020,1,1) >>> "202001010000"
    • Converts timedelta input to a string
      • timedelta(hours=1) >>> "60"
    • Converts lists (station IDs and variable names) to comma separated strings
      • ["WBB", "KSLC"] >>> "WBB,KSLC"
      • ["air_temp", "wind_speed"] >>> "air_temp,wind_speed"
  2. stations_metadata - Returns metadata (information) about stations. Synoptic Docs ๐Ÿ”—

  3. stations_timeseries - Return data for a period of time. Synoptic Docs ๐Ÿ”—

  4. stations_nearesttime - Return observation closest to the requested time. Synoptic Docs ๐Ÿ”—

  5. stations_latest - Return the most recent observations Synoptic Docs ๐Ÿ”—

  6. ๐Ÿ— stations_precipitation - Return precipitation data (with derived quantities) Synoptic Docs ๐Ÿ”—

  7. networks - Return information about networks of stations Synoptic Docs ๐Ÿ”—

  8. networktypes - Return network category information Synoptic Docs ๐Ÿ”—

  9. variables - Return available variables Synoptic Docs ๐Ÿ”—

  10. qctypes - Return quality control information Synoptic Docs ๐Ÿ”— `

  11. auth - Manage tokens (you are better off doing this in the browser in your Synoptic profile) Synoptic Docs ๐Ÿ”—

  12. ๐Ÿ— stations_latency - Latency information for a station Synoptic Docs ๐Ÿ”—

  13. ๐Ÿ— stations_qcsegments - Quality control for a period Synoptic Docs ๐Ÿ”—



๐Ÿงญ Function Parameters

Function arguments are stitched together to create a web query. The parameters you can use to filter the data depend on the API service. Synoptic's API Explorer can help you determine what parameters can be used for each service.

If the Synoptic API is new to you, I recommend you become familiar with the Station Selector arguments first. These parameters key in on specific stations or a set of stations within an area of interest (stid, radius, vars, state, etc.).

๐Ÿคน๐Ÿปโ€โ™‚๏ธ Examples

Some things you should know first:

  1. All lists are joined together into a comma separated string. For instance, if you are requesting three stations, you could do stid=['WBB', 'KSLC', 'KMRY'], and that will be converted to a comma separated list stid='WBB,KSLC,KMRY' required for the API request URL.
  2. Any input that is a datetime object (any datetime that can be parsed with f-string, f'{DATE:%Y%m%d%H%M}') will be converted to a string required by the API (e.g., start=datetime(2020,1,1) will be converted to start='YYYYmmddHHMM' when the query is made.)
  3. For services that requires the within or recent arguments, these must be given in minutes. You may give integers for those arguments, but converting time to minutes is done automatically if you input a datetime.timedelta or a pandas datetime. For example, if you set within=timedelta(hours=1) or recent=pd.to_timedelta('1d'), the function will convert the value to minutes for you.

โ“ What if I don't know a station's ID?

MesoWest is your friend if you don't know what stations are available or what they are named: https://mesowest.utah.edu/.

To get a time series of air temperature and wind speed for the last 10 hours for the William Browning Building (WBB) you can do...

from datetime import timedelta
from synotpic.services import stations_timeseries

df = stations_timeseries(stid='WBB', 
                        vars=['air_temp', 'wind_speed'],
                        recent=timedelta(hours=10))

To get the latest air temperature and wind speed data for WBB (University of Utah) and KRMY (Monterey, CA airport) within one hour, we can also set the minutes as an integer instead as a timedelta.

from synotpic.services import stations_latest

df = stations_latest(stid=['WBB', 'KMRY'],
                    vars=['air_temp', 'wind_speed'],
                    within=60)

Note: Parameters may be given as a list/datetime/timedelta, or as a string/integer interpreted by the Synoptic API. Thus,

stations_latest(stid='WBB,KMRY',
                vars='air_temp,wind_speed',
                within=60)

is equivalent to the above example.

To get the air temperature and wind speed for WBB and KMRY nearest 00:00 UTC Jan 1, 2020 within one hour...

from datetime import datetime
from synotpic.services import stations_nearesttime

df = stations_latest(stid=['WBB', 'KMRY'], 
                    vars=['air_temp', 'wind_speed'],
                    attime=datetime(2020,1,1),
                    within=60)

Note: the string/integer alternative to the above example is

stations_nearesttime(stid='WBB,KMRY',
                     vars='air_temp,wind_speed',
                     attime='2020010100',
                     within=60)

Use whichever is more convenient for you. I often use both methods. It depends on what I am doing.

โ™ป Returned Data: Variable Names

The raw data retrieved from the Synoptic API is converted from JSON to a Pandas DataFrame.

If you look at the raw JSON returned, you will see that the observation values are returned as "sets" and "values", (e.g., air_temp_set_1, pressure_set_1d, wind_speed_value_1, etc.). This is because some stations have more than one sensor for a variable (e.g., wind at more than one level at a single site) or is reported at more than one interval (e.g., ozone at 1 hr and 15 min intervals). Time series requests return "sets" and nearest time requests return "values".

I don't really like dealing with the set and value labels. Almost always, I want the set or value with the most data or the most recent observation. My functions, by default, will strip the set_1 and value_1 from the labels on the returned data. If there are more than one set or value, however, then the "set" and "value" labels will be retained for those extra sets.

  • If a query returns air_temp_set_1 and air_temp_set_2, then the labels are renamed air_temp and air_temp_set_2.
  • If a query returns pressure_set_1 and pressure_set_1d, then the labels are renamed pressure_set_1 and pressure if set_1d has more observations than set_1.
  • If a query returns dew_point_temperature_value_1 at 00:00 UTC and dew_point_temperature_value_1d at 00:15 UTC are both returned, then the labels are renamed dew_point_temperature_value_1 and dew_point_temperature because the derived quantity is the most recent observation available.

In short, all sets and values are always returned, but column labels are simplified for the columns that I am most likely to use.

For the renamed columns, it is up to the user to know if the data is a derived quantity and which set/value it is. To find out, look for attributes "SENSOR_VARIABLES" and "RENAME" in the DataFrame attributes (df.attrs), or look at the raw JSON.

Doing this makes sense to me, but if you are confused and don't trust what I'm doing, you can turn this "relabeling" off with rename_set_1=False and rename_value_1=False (for the appropriate function).

๐ŸŒ Latitude and Longitude

I should mention, LATITUDE and LONGITUDE in the raw JSON is renamed to latitude and longitude (lowercase) to match CF convention.

๐Ÿ’จ U and V Wind Components

If the returned data contains variables for both wind_speed and wind_direction, then the DataFrame will compute and return the U and V wind components as wind_u and wind_v.

โฒ Timezone

The default timezone the data is returned is in UTC time. You may change the time to local time with the parameter obtimezone=local. Pandas will return the data with a timezone-aware index. However, I found that matplotlib plotting functions convert this time back to UTC. To plot by local time, you need to use the tz_localize(None) method to make it unaware of timezone and plot local time correctly. For example, compare the two plots created with the following:

import matplotlib.pyplot as plt
from synoptic.services import stations_timeseries

df = stations_timeseries(stid='KSLC',
                        recent=1000,
                        obtimezone='local',
                        vars='air_temp')

plt.plot(df.index, df.air_temp, label='tz aware (plots in UTC)')
plt.plot(df.index.tz_localize(None), df.air_temp, label='tz unaware (as local time)')
plt.legend()

โœ… How to set Synoptic's quality control checks

By default, only basic QC range checks are applied to the data before it is returned by the API. These basic checks remove physically implausible data like removing a 300 degree temperature instead of returning the value.

You can add additional QC checks that more stringently remove "bad" data that might not be representative of the area or caused by a faulty sensor. However, you can't expect every bad observation will be removed (or every good observation will be retained).

Some tips:

  • You can turn on more QC checks by Synoptic with the parameter qc_checks='synopticlabs'
  • You can turn all QC checks on (includes synopiclab, mesowest, and madis checks) with the parameter qc_checks='all'.
  • You can see the number of data point removed in the QC summary in the DataFrame attributes df.attrs['QC_SUMMARY'].
  • Specific checks can be turned on (read the docs for more details).

For example:

df = stations_timeseries(stid='UKBKB', recent=60, qc_checks='synopticlabs')

or

df = stations_timeseries(stid='UKBKB', recent=60, qc_checks='all')

Look at the QC_SUMMARY in the DataFrame attributes to see some info about what each QC check means and how many are flagged...

df.attrs['QC_SUMMARY']
>>>{'QC_SHORTNAMES': {'18': 'ma_stat_cons_check', '16': 'ma_temp_cons_check'},
    'QC_CHECKS_APPLIED': ['all'],
    'PERCENT_OF_TOTAL_OBSERVATIONS_FLAGGED': 2.03,
    'QC_SOURCENAMES': {'18': 'MADIS', '16': 'MADIS'},
    'TOTAL_OBSERVATIONS_FLAGGED': 750.0,
    'QC_NAMES': {'18': 'MADIS Spatial Consistency Check', '16': 'MADIS Temporal Consistency Check'}}

You might be able to find a better way to mask out those QC'ed values, but here is one method for the QC check for wind_speed_set_1:

# Identify which ones "passed" the QC checks (these have None in the QC array)
qc_mask = np.array([x is None for x in df.attrs['QC']['wind_speed_set_1']])
df = df.loc[qc_mask]

๐Ÿ“ˆ plots.py

These are a work in progress

Some helpers for plotting data from the Synoptic API.

# Import all functions
import synoptic.plots as sp

or

# Import individual functions
from synoptic.plots import plot_timeseries

If you have stumbled across this package, I hope it is useful to you or at least gives you some ideas.

Best of Luck ๐Ÿ€
-Brian

Comments
  • synoptic spelled incorrectly in documentation

    synoptic spelled incorrectly in documentation

    Hi. I am just getting started with the package. I was copy/pasting your code and noticed a typo in the documentation. You spelled synoptic incorrectly.

    Get a timeseries of air temperature and wind speed at the station WBB for the last 10 hours:

    from datetime import timedelta
    from **synotpic**.services import stations_timeseries
    
    df = stations_timeseries(
        stid='WBB', 
        vars=['air_temp', 'wind_speed'],
        recent=timedelta(hours=10)
    )
    
    opened by hayesfj 3
  • config token after update

    config token after update

    After installing a package update, the user needs to add their token to the config file again. This is a bit annoying.

    Need to come up with a way to preserve the users config file. Maybe write it to the home directory instead of the package directory.

    opened by blaylockbk 2
  • Unclear requirement for `../synoptic` directory

    Unclear requirement for `../synoptic` directory

    Setup docs say to create ../synoptic/ and add a cfg file there, this appears obsolete. It works fine in the CWD though, and that's much more convenient for deployable scripts. Also, that means that the inelegant suggestion to sys.path.append('../') is no longer needed.

    Also: it appears there's an undocumented dependency on toml.

    opened by dsjstc 1
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    blaylockbk/SynopticPy now has a Chat Room on Gitter

    @blaylockbk has just created a chat room. You can visit it here: https://gitter.im/blaylockbk/SynopticPy.

    This pull-request adds this badge to your README.md:

    Gitter

    If my aim is a little off, please let me know.

    Happy chatting.

    PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

    opened by gitter-badger 0
  • Allow passing public token via Env

    Allow passing public token via Env

    I am trying to get SynopticPy to run on machines where I do not have full control over the file system and I cannot create the config.toml file as needed.

    Is there a way to set the public token via environment or similar?

    opened by j0nes2k 2
  • Add argument to automatically apply mask to QC'd data

    Add argument to automatically apply mask to QC'd data

    Currently, if you apply additional Quality Controls on the data (``) that data is stored in the DataFrame attributes. The README file describes how you can mask the flagged data, but it would be nice if this were done automatically, as a Boolean argument when the DataFrame is created.

    (See https://github.com/blaylockbk/SynopticPy#-how-to-set-synoptics-quality-control-checks)

    opened by blaylockbk 0
Releases(0.0.7)
  • 0.0.7(Dec 4, 2021)

    What's Changed

    • Fixed issue when creating initial config file

    Full Changelog: https://github.com/blaylockbk/SynopticPy/compare/0.0.6...0.0.7

    Source code(tar.gz)
    Source code(zip)
  • 0.0.6(Aug 30, 2021)

  • 0.0.5(Feb 26, 2021)

    Be aware, this is v0.0.5, meaning it is subject to change at my leisure. The purpose of this repository is to serve as an example of how you can access data from the Synoptic API, but I try to keep this package in a workable state that might be useful for you.

    ๐Ÿ“” Documentation

    Source code(tar.gz)
    Source code(zip)
Owner
Brian Blaylock
Atmospheric scientist. Post-doc at Naval Research Laboratory
Brian Blaylock
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata

ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata (Name, company, port, user manua

QeeqBox 2 Dec 13, 2021
Squidpy is a tool for the analysis and visualization of spatial molecular data.

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools t

Theis Lab 251 Dec 19, 2022
A concise grammar of interactive graphics, built on Vega.

Vega-Lite Vega-Lite provides a higher-level grammar for visual analysis that generates complete Vega specifications. You can find more details, docume

Vega 4k Jan 08, 2023
Pretty Confusion Matrix

Pretty Confusion Matrix Why pretty confusion matrix? We can make confusion matrix by using matplotlib. However it is not so pretty. I want to make con

Junseo Ko 5 Nov 22, 2022
termplotlib is a Python library for all your terminal plotting needs.

termplotlib termplotlib is a Python library for all your terminal plotting needs. It aims to work like matplotlib. Line plots For line plots, termplot

Nico Schlรถmer 553 Dec 30, 2022
The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

The HIST framework for stock trend forecasting The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining C

Wentao Xu 111 Jan 03, 2023
Cryptocurrency Centralized Exchange Visualization

This is a simple one that uses Grafina to visualize cryptocurrency from the Bitkub exchange. This service will make a request to the Bitkub API from your wallet and save the response to Postgresql. G

Popboon Mahachanawong 1 Nov 24, 2021
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 3.1k Jan 08, 2023
A data visualization curriculum of interactive notebooks.

A data visualization curriculum of interactive notebooks, using Vega-Lite and Altair. This repository contains a series of Python-based Jupyter notebooks.

UW Interactive Data Lab 1.2k Dec 30, 2022
Keir&'s Visualizing Data on Life Expectancy

Keir's Visualizing Data on Life Expectancy Below is information on life expectancy in the United States from 1900-2017. You will also find information

9 Jun 06, 2022
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 239 Nov 10, 2022
Visualize tensors in a plain Python REPL using Sparklines

Visualize tensors in a plain Python REPL using Sparklines

Shawn Presser 43 Sep 03, 2022
Data aggregated from the reports found at the MCPS COVID Dashboard into a set of visualizations.

Montgomery County Public Schools COVID-19 Visualizer Contents About this project Data Support this project About this project Data All data we use can

James 3 Jan 19, 2022
๐ŸŒ€โ„๏ธ๐ŸŒฉ๏ธ This repository contains some examples for creating 2d and 3d weather plots using matplotlib and cartopy libraries in python3.

Weather-Plotting ๐ŸŒ€ โ„๏ธ ๐ŸŒฉ๏ธ This repository contains some examples for creating 2d and 3d weather plots using matplotlib and cartopy libraries in pytho

Giannis Dravilas 21 Dec 10, 2022
A filler visualizer built using python

filler-visualizer 42 filler ใฎใƒญใ‚ฐใ‚’ใƒ“ใ‚ธใƒฅใ‚ขใƒฉใ‚คใ‚บใ—ใฆใ‚นใƒใƒผใƒ„ใ•ใชใŒใ‚‰ๆฅฝใ—ใ‚€ใ“ใจใŒใงใใพใ™๏ผ Usage ๏ผˆๆจ™ๆบ–ๅ…ฅๅŠ›ใงvisualizer.pyใซๆธกใ›ใฐALL OK๏ผ‰ 1. ๆ—ขใซใ‚ใ‚‹ใƒญใ‚ฐใ‚’ใƒ“ใ‚ธใƒฅใ‚ขใƒฉใ‚คใ‚บใ™ใ‚‹ $ ./filler_vm -t 3 -p1 john_fill

Takumi Hara 1 Nov 04, 2021
Function Plotter: a simple application with GUI to plot mathematical functions

Function-Plotter Function Plotter is a simple application with GUI to plot mathe

Mohamed Nabawe 4 Jan 03, 2022
Official Matplotlib cheat sheets

Official Matplotlib cheat sheets

Matplotlib Developers 6.7k Jan 09, 2023
Graphing communities on Twitch.tv in a visually intuitive way

VisualizingTwitchCommunities This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch

Kiran Gershenfeld 312 Jan 07, 2023
Voilร , install macOS on ANY Computer! This is really and magic easiest way!

OSX-PROXMOX - Run macOS on ANY Computer - AMD & Intel Install Proxmox VE v7.02 - Next, Next & Finish (NNF). Open Proxmox Web Console - Datacenter N

Gabriel Luchina 654 Jan 09, 2023