Python client for the Socrata Open Data API

Overview

PyPI version Build Status Code Coverage

sodapy

sodapy is a python client for the Socrata Open Data API.

Installation

You can install with pip install sodapy.

If you want to install from source, then clone this repository and run python setup.py install from the project root.

Requirements

At its core, this library depends heavily on the Requests package. All other requirements can be found in requirements.txt. sodapy is currently compatible with Python 3.5, 3.6, 3.7 and 3.8.

Documentation

The official Socrata Open Data API docs provide thorough documentation of the available methods, as well as other client libraries. A quick list of eligible domains to use with this API is available via the Socrata Discovery API or Socrata's Open Data Network.

This library supports writing directly to datasets with the Socrata Open Data API. For write operations that use data transformations in the Socrata Data Management Experience (the user interface for creating datasets), use the Socrata Data Management API. For more details on when to use SODA vs the Data Management API, see the Data Management API documentation. A Python SDK for the Socrata Data Management API can be found at socrata-py.

Examples

There are some jupyter notebooks in the examples directory with usage examples of sodapy in action.

Interface

Table of Contents

client

Import the library and set up a connection to get started.

>>> from sodapy import Socrata
>>> client = Socrata(
        "sandbox.demo.socrata.com",
        "FakeAppToken",
        username="[email protected]",
        password="mypassword",
        timeout=10
    )

username and password are only required for creating or modifying data. An application token isn't strictly required (can be None), but queries executed from a client without an application token will be subjected to strict throttling limits. You may want to increase the timeout seconds when making large requests. To create a bare-bones client:

>>> client = Socrata("sandbox.demo.socrata.com", None)

A client can also be created with a context manager to obviate the need for teardown:

>>> with Socrata("sandbox.demo.socrata.com", None) as client:
>>>    # do some stuff

The client, by default, makes requests over HTTPS. To modify this behavior, or to make requests through a proxy, take a look here.

datasets(limit=0, offset=0)

Retrieve datasets associated with a particular domain. The optional limit and offset keyword args can be used to retrieve a subset of the datasets. By default, all datasets are returned.

>>> client.datasets()
[{"resource" : {"name" : "Approved Building Permits", "id" : "msk6-43c6", "parent_fxf" : null, "description" : "Data of approved building/construction permits",...}, {resource : {...}}, ...]

get(dataset_identifier, content_type="json", **kwargs)

Retrieve data from the requested resources. Filter and query data by field name, id, or using SoQL keywords.

>>> client.get("nimj-3ivp", limit=2)
[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitude': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'occurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {...}]

>>> client.get("nimj-3ivp", where="depth > 300", order="magnitude DESC", exclude_system_fields=False)
[{u'geolocation': {u'latitude': u'-15.563', u'needs_recoding': False, u'longitude': u'-175.6104'}, u'version': u'9', u':updated_at': 1348778988, u'number_of_stations': u'275', u'region': u'Tonga', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T21:16:43', u':id': 132, u'source': u'us', u'depth': u'328.30', u'magnitude': u'4.8', u':meta': u'{\n}', u':updated_meta': u'21484', u'earthquake_id': u'c000cnb5', u':created_at': 1348778988}, {...}]

>>> client.get("nimj-3ivp/193", exclude_system_fields=False)
{u'geolocation': {u'latitude': u'21.6711', u'needs_recoding': False, u'longitude': u'142.9236'}, u'version': u'C', u':updated_at': 1348778988, u'number_of_stations': u'136', u'region': u'Mariana Islands region', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T11:19:07', u':id': 193, u'source': u'us', u'depth': u'300.70', u'magnitude': u'4.4', u':meta': u'{\n}', u':updated_meta': u'21484', u':position': 193, u'earthquake_id': u'c000cmsq', u':created_at': 1348778988}

>>> client.get("nimj-3ivp", region="Kansas")
[{u'geolocation': {u'latitude': u'38.10', u'needs_recoding': False, u'longitude': u'-100.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Kansas', u'occurred_at': u'2010-09-19T20:52:09', u'number_of_stations': u'15', u'depth': u'300.0', u'magnitude': u'1.9', u'earthquake_id': u'00189621'}, {...}]

get_all(dataset_identifier, content_type="json", **kwargs)

Read data from the requested resource, paginating over all results. Accepts the same arguments as get(). Returns a generator.

>>> client.get_all("nimj-3ivp")
<generator object Socrata.get_all at 0x7fa0dc8be7b0>

>>> for item in client.get_all("nimj-3ivp"):
...     print(item)
...
{'geolocation': {'latitude': '-15.563', 'needs_recoding': False, 'longitude': '-175.6104'}, 'version': '9', ':updated_at': 1348778988, 'number_of_stations': '275', 'region': 'Tonga', ':created_meta': '21484', 'occurred_at': '2012-09-13T21:16:43', ':id': 132, 'source': 'us', 'depth': '328.30', 'magnitude': '4.8', ':meta': '{\n}', ':updated_meta': '21484', 'earthquake_id': 'c000cnb5', ':created_at': 1348778988}
...

>>> import itertools
>>> items = client.get_all("nimj-3ivp")
>>> first_five = list(itertools.islice(items, 5))
>>> len(first_five)
5

get_metadata(dataset_identifier, content_type="json")

Retrieve the metadata associated with a particular dataset.

>>> client.get_metadata("nimj-3ivp")
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "http://foo.bar.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

update_metadata(dataset_identifier, update_fields, content_type="json")

Update the metadata for a particular dataset. update_fields should be a dictionary containing only the metadata keys that you wish to overwrite.

Note: Invalid payloads to this method could corrupt the dataset or visualization. See this comment for more information.

>>> client.update_metadata("nimj-3ivp", {"attributionLink": "https://anothertest.com"})
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "https://anothertest.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

Download all attachments associated with a dataset. Return a list of paths to the downloaded files.

>>> client.download_attachments("nimj-3ivp", download_dir="~/Desktop")
    ['/Users/xmunoz/Desktop/nimj-3ivp/FireIncident_Codes.PDF', '/Users/xmunoz/Desktop/nimj-3ivp/AccidentReport.jpg']

create(name, **kwargs)

Create a new dataset. Optionally, specify keyword args such as:

  • description description of the dataset
  • columns list of fields
  • category dataset category (must exist in /admin/metadata)
  • tags list of tag strings
  • row_identifier field name of primary key
  • new_backend whether to create the dataset in the new backend

Example usage:

>>> columns = [{"fieldName": "delegation", "name": "Delegation", "dataTypeName": "text"}, {"fieldName": "members", "name": "Members", "dataTypeName": "number"}]
>>> tags = ["politics", "geography"]
>>> client.create("Delegates", description="List of delegates", columns=columns, row_identifier="delegation", tags=tags, category="Transparency")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

publish(dataset_identifier, content_type="json")

Publish a dataset after creating it, i.e. take it out of 'working copy' mode. The dataset id id returned from create will be used to publish.

>>> client.publish("2frc-hyvj")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

set_permission(dataset_identifier, permission="private", content_type="json")

Set the permissions of a dataset to public or private.

>>> client.set_permission("2frc-hyvj", "public")
<Response [200]>

upsert(dataset_identifier, payload, content_type="json")

Create a new row in an existing dataset.

>>> data = [{'Delegation': 'AJU', 'Name': 'Alaska', 'Key': 'AL', 'Entity': 'Juneau'}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 1, u'By RowIdentifier': 0}

Update/Delete rows in a dataset.

>>> data = [{'Delegation': 'sfa', ':id': 8, 'Name': 'bar', 'Key': 'doo', 'Entity': 'dsfsd'}, {':id': 7, ':deleted': True}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 1, u'Rows Updated': 1, u'By SID': 2, u'Rows Created': 0, u'By RowIdentifier': 0}

upsert's can even be performed with a csv file.

>>> data = open("upsert_test.csv")
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 1, u'By SID': 1, u'Rows Created': 0, u'By RowIdentifier': 0}

replace(dataset_identifier, payload, content_type="json")

Similar in usage to upsert, but overwrites existing data.

>>> data = open("replace_test.csv")
>>> client.replace("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 12, u'By RowIdentifier': 0}

create_non_data_file(params, file_obj)

Creates a new file-based dataset with the name provided in the files tuple. A valid file input would be:

files = (
    {'file': ("gtfs2", open('myfile.zip', 'rb'))}
)
>>> with open(nondatafile_path, 'rb') as f:
>>>     files = (
>>>         {'file': ("nondatafile.zip", f)}
>>>     )
>>>     response = client.create_non_data_file(params, files)

replace_non_data_file(dataset_identifier, params, file_obj)

Same as create_non_data_file, but replaces a file that already exists in a file-based dataset.

Note: a table-based dataset cannot be replaced by a file-based dataset. Use create_non_data_file in order to replace.

>>>  with open(nondatafile_path, 'rb') as f:
>>>      files = (
>>>          {'file': ("nondatafile.zip", f)}
>>>      )
>>>      response = client.replace_non_data_file(DATASET_IDENTIFIER, {}, files)

delete(dataset_identifier, row_id=None, content_type="json")

Delete an individual row.

>>> client.delete("nimj-3ivp", row_id=2)
<Response [200]>

Delete the entire dataset.

>>> client.delete("nimj-3ivp")
<Response [200]>

close()

Close the session when you're finished.

>>> client.close()

Run tests

$ pytest

Contributing

See CONTRIBUTING.md.

Meta

This package uses semantic versioning.

Source and wheel distributions are available on PyPI. Here is how I create those releases.

python3 setup.py bdist_wheel
python3 setup.py sdist
twine upload dist/*
Owner
Cristina
ACAB
Cristina
A simple python discord bot with commands for moderation and utility.

Discord Bot A simple python discord bot with commands for moderation, utility and fun. Moderation $kick user reason - Kick a user from the server

3 Jan 07, 2022
[OSGIFI] - INFORMATION GATHERING TOOL, FROM INSTAGRAM ACCOUNTS.

⚡ OSGIFI THIS TOOL PERMIT YOU TO DISCOVERING & GATHERING INFO FROM INSTAGRAM ACCOUNTS, FOR EXAMPLE: Full Name Verified Account Or Not Private Account

BASILEOLUS 9 Nov 29, 2022
Interact and easily use Google Chat room webhooks.

Chat Webhooks Easily interact and send messages with Google Chat's webhooks feature. This API is small, but should be a nice framework for working wit

BD103 2 Dec 13, 2021
This is Telegram Files Store Bot by @AbirHasan2005

PyroFilesStoreBot This is Telegram Parmanent Files Store Bot by @AbirHasan2005. Language: Python3 Library: Pyrogram Features: In PM Just Forward or Se

Abir Hasan 168 Dec 19, 2022
A Discord Self-Bot in Python

👨‍💻 Discord Self Bot 👨‍💻 A Discord Self-Bot in Python by natrix Installation Run: selfbot.bat Python: version : 3.8 Modules

natrix_dev 3 Oct 02, 2022
Telegram 聊天機器人,追蹤momo降價、重新上架

簡介 price-tracker-bot is a telegram bot that can trace the price on momoshop. 功能 降價通知 上架通知 收藏商品 清空已收藏商品 顯示目前已收藏商品 Demo Bot Telegram bot search @momo_pr

92 Dec 28, 2022
A Multi-Tool with 30+Options.

A Multi-Tool with 30+Options.

Mervin404 15 Apr 12, 2022
A multi-password‌ cracking tool that can help you hack facebook accounts very quickly

FbCracker This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly. Facebook Hacking Tool Installation On Termux

ReD H4CkeR 9 Nov 16, 2022
JAWS Pankration 2021 - DDD on AWS Lambda sample

JAWS Pankration 2021 - DDD on AWS Lambda sample What is this project? This project contains sample code for AWS Lambda with domain models. I presented

Atsushi Fukui 21 Mar 30, 2022
Remedy when Amazon ECR is not running basic scans for container CVEs.

Welcome to your CDK Python project! This is a blank project for Python development with CDK. The cdk.json file tells the CDK Toolkit how to execute yo

4n6ir 4 Nov 05, 2022
Simple base for a telethon bot!

Telethon Bot Simple base used to make a Telegram Bot in telethon. Join @BotzHub! Note: The client, here, is named BotzHub. Fork and add your plugins t

Aditya 54 Oct 21, 2022
Download song lyrics and metadata from Genius.com 🎶🎤

LyricsGenius: a Python client for the Genius.com API lyricsgenius provides a simple interface to the song, artist, and lyrics data stored on Genius.co

John W. Miller 738 Jan 04, 2023
自动每天给女友发邮件

github acitons 发邮件 python 脚本 每天 7点半左右给女朋友发送邮件 天气来自: http://www.tianqiapi.com/ 文字图片来源:http://wufazhuce.com/ 风景图:https://qqlykm.cn/api/fengjing 土味情话:htt

gogobody 7 May 12, 2022
Python based league of legends orbwalker

League of Legends Orbwalker Usage Install python3 Create a python3 venv Install the requirements pip install -r requirements.txt Get in game and run m

Inusha 43 Dec 12, 2022
Raid ToolBox (RTB) is a big toolkit of Spamming/Raiding/Token management tools for discord.

This code is very out of date and not very good, feel free to make it into something better. (we check the github page every 5 years to pulls your PRs

2 Oct 03, 2021
A tool that helps keeping track of your AWS quota utilization

aws-quota-checker A tool that helps keeping track of your AWS quota utilization. It'll determine the limits of your AWS account and compare them to th

Max 63 Dec 14, 2022
Request based Python module(s) to help with the Newegg raffle.

Newegg Shuffle Python module(s) to help you with the Newegg raffle How to use $ git clone https://github.com/Matthew17-21/Newegg-Shuffle $ cd Newegg-S

Matthew 45 Dec 01, 2022
Flask-SQLAlchemy API for daisuki-web

💟 Anime Daisuki! API API de animes com cadastro de usuários. O usuário autenticado pode avaliar e favoritar animes, comentar episódios e verificar o

Paulo Thor 1 Nov 04, 2021
Use an air-gapped Raspberry Pi Zero to sign for Bitcoin transactions! (and do other cool stuff)

Hello World! Build your own offline, airgapped Bitcoin transaction signing device for less than $35! Also generate seed word 24 or generate a seed phr

371 Dec 31, 2022
Biblioteca Python que extrai dados de mercado do Bacen (Séries Temporais)

Pybacen This library was developed for economic analysis in the Brazilian scenario (Investments, micro and macroeconomic indicators) Installation Inst

42 Jan 05, 2023