Python function to query SQLite files stored on S3

Overview

sqlite-s3-query CircleCI Test Coverage

Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire file, and so is suitable for large databases.

All the HTTP requests for a query request the same version of the database object in S3, so queries should complete succesfully even if the database is replaced concurrently by another S3 client. Versioning must be enabled on the S3 bucket.

Operations that write to the database are not supported.

Inspired by phiresky's sql.js-httpvfs, and dacort's Stack Overflow answer.

Installation

sqlite-s3-query depends on APSW, which is not available on PyPI, but can be installed directly from GitHub.

pip install sqlite_s3_query
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions

Usage

from sqlite_s3_query import sqlite_s3_query

results_iter = sqlite_s3_query(
    'SELECT * FROM my_table WHERE my_column = ?', params=('my-value',),
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in results_iter:
    print(row)

If in your project you use multiple queries to the same file, functools.partial can be used to make an interface with less duplication.

from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query,
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in query_my_db('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
    print(row)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)

The AWS region and the credentials are taken from environment variables, but this can be changed using the get_credentials parameter. Below shows the default implementation of this that can be overriden.

import os
from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
    get_credentials=lambda: (
        os.environ['AWS_DEFAULT_REGION'],
        os.environ['AWS_ACCESS_KEY_ID'],
        os.environ['AWS_SECRET_ACCESS_KEY'],
        os.environ.get('AWS_SESSION_TOKEN'),  # Only needed for temporary credentials
    ),
)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)
Owner
Michal Charemza
Michal Charemza
Oh-My-PickleDB is an open source key-value store using Python's json module.

OH-MY-PICKLEDB oh-my-pickleDB is a lightweight, fast, and intuitive data manager written in python 📝 Table of Contents About Getting Started Deployme

Adrián Toral 6 Feb 20, 2022
Decentralised graph database management system

Decentralised graph database management system To get started clone the repo, and run the command below. python3 database.py Now, create a new termina

Omkar Patil 2 Apr 18, 2022
ChaozzDBPy - A python implementation based on the original ChaozzDB from Chaozznl with some new features

ChaozzDBPy About ChaozzDBPy is a python implementation based on the original Cha

Igor Iglesias 1 May 25, 2022
EmployeeDB - Advanced Redis search functionalities on Python applied on an Employee management backend app

EmployeeDB - Advanced Redis search functionalities on Python applied on an Employee management backend app

Ahmad Bazzi 58 Oct 10, 2022
LightDB is a lightweight JSON Database for Python

LightDB What is this? LightDB is a lightweight JSON Database for Python that allows you to quickly and easily write data to a file Installing pip3 ins

Stanislaw 14 Oct 01, 2022
A fast ordered NoSQL database.

MerkavaDB Note This is still in active development. Things will change. If you are interested in helping out, or would like to see any particular feat

Adam Hopkins 6 Sep 29, 2022
Simple embedded in memory json database

dbj dbj is a simple embedded in memory json database. It is easy to use, fast and has a simple query language. The code is fully documented, tested an

Pedro Gonring 25 Aug 12, 2022
A Persistent Embedded Graph Database for Python

Cog - Embedded Graph Database for Python cogdb.io New release: 2.0.5! Installing Cog pip install cogdb Cog is a persistent embedded graph database im

Arun Mahendra 214 Dec 30, 2022
pickleDB is an open source key-value store using Python's json module.

pickleDB pickleDB is lightweight, fast, and simple database based on the json module. And it's BSD licensed! pickleDB is Fun import pickledb

Harrison Erd 738 Jan 04, 2023
HTTP graph database built in Python 3

KiwiDB HTTP graph database built in Python 3. Reference Format References are strings in the format: { JanCraft 1 Dec 17, 2021

A Modular MWDB Utility to Collect Fresh Malware Samples

MWDB Feeds A Modular MWDB Utility to Collect Fresh Malware Samples This project is FREE as in FREE 🍺 , use it commercially, privately or however you

c3rb3ru5 32 Jul 07, 2022
Metrics-advisor - Analyze reshaped metrics from TiDB cluster Prometheus and give some advice about anomalies and correlation.

metrics-advisor Analyze reshaped metrics from TiDB cluster Prometheus and give some advice about anomalies and correlation. Team freedeaths mashenjun

3 Jan 07, 2022
Simpledb-py: Simple JSON database

Simpledb-py: Simple JSON database

тейлс 2 Feb 09, 2022
Shelf DB is a tiny document database for Python to stores documents or JSON-like data

Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S

Um Nontasuwan 35 Nov 03, 2022
This is a simple graph database in SQLite, inspired by

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

Denis Papathanasiou 1.2k Jan 03, 2023
Connect Django Project to PostgreSQL

An application for learning things with creating quizzes and flashcards.Django, PostgresSQL are used for this project.

Cena Ashoori 1 Jan 25, 2022
Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries.

Mini-SQL-Engine Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries. The query engine supports Project, A

Prashant Raj 1 Dec 03, 2021
Manage your sqlite database very easy (like django) ...

Manage your sqlite database very easy (like django) ...

aWolver 1 Feb 09, 2022
Makes google's political ad database actually useful

Making Google's political ad transparency library suck less This is a series of scripts that takes Google's political ad transparency data and makes t

The Guardian 7 Apr 28, 2022
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022