Python virtual filesystem for SQLite to read from and write to S3

Overview

sqlite-s3vfs CircleCI Test Coverage

Python virtual filesystem for SQLite to read from and write to S3.

No locking is performed, so client code must ensure that writes do not overlap with other writes or reads. If multiple writes happen at the same time, the database will probably become corrupt and data be lost.

Inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer, and sqlite-s3-query.

How does it work?

sqlite-s3vfs stores the SQLite database in fixed-sized blocks, and each is stored as a separate object in S3. SQLite stores its data in fixed-size pages, and always writes exactly a page at a time. This virtual filesystem translates page reads and writes to block reads and writes. In the case of SQLite pages being the same size as blocks, which is the case by default, each page write results in exactly one block write.

Separate objects are required since S3 does not support the partial replace of an object; to change even 1 byte, it must be re-uploaded in full.

Installation

sqlite-s3vfs depends on APSW, which is not officially available on PyPI, but can be installed directly from GitHub.

pip install sqlite-s3vfs
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--sqlite --global-option=build --global-option=--enable-all-extensions

Usage

sqlite-s3vfs is an APSW virtual filesystem that requires boto3 for its communication with S3.

import apsw
import boto3
import sqlite_s3vfs

# A boto3 bucket resource
bucket = boto3.Session().resource('s3').Bucket('my-bucket')

# An S3VFS for that bucket
s3vfs = sqlite_s3vfs.S3VFS(bucket=bucket)

# sqlite-s3vfs stores many objects under this prefix
# Note that it's not typical to start a key prefix with '/'
key_prefix = 'my/path/cool.sqlite'

# Connect, insert data, and query
with apsw.Connection(key_prefix, vfs=s3vfs.name) as db:
    cursor = db.cursor()
    cursor.execute(f'''
        CREATE TABLE foo(x,y);
        INSERT INTO foo VALUES(1,2);
    ''')
    cursor.execute('SELECT * FROM foo;')
    print(cursor.fetchall())

See the APSW documentation for more examples.

Serializing (getting a regular SQLite file out of the VFS)

The bytes corresponding to a regular SQLite file can be extracted with the serialize_iter function, which returns an iterable,

for chunk in s3vfs.serialize_iter(key_prefix=key_prefix):
    print(chunk)

or with serialize_fileobj, which returns a non-seekable file-like object. This can be passed to Boto3's upload_fileobj method to upload a regular SQLite file to S3.

target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('target/cool.sqlite')
target_obj.upload_fileobj(s3vfs.serialize_fileobj(key_prefix=key_prefix))

Deserializing (getting a regular SQLite file into the VFS)

# Any iterable that yields bytes can be used. In this example, bytes come from
# a regular SQLite file already in S3
source_obj = boto3.Session().resource('s3').Bucket('my-source-bucket').Object('source/cool.sqlite')
bytes_iter = source_obj.get()['Body'].iter_chunks()

s3vfs.deserialize_iter(key_prefix='my/path/cool.sqlite', bytes_iter=bytes_iter)

Tests

The tests require the dev dependencies and APSW to installed, and MinIO started

pip install -r requirements-dev.txt
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions
./start-minio.sh

can be run with pytest

pytest

and finally Minio stopped

./stop-minio.sh
Owner
Department for International Trade
Department for International Trade
This simple python script pcopy reads a list of file names and copies them to a separate folder

pCopy This simple python script pcopy reads a list of file names and copies them to a separate folder. Pre-requisites Python 3 (ver. 3.6) How to use

Madhuranga Rathnayake 0 Sep 03, 2021
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
gitfs is a FUSE file system that fully integrates with git - Version controlled file system

gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the rem

Presslabs 2.3k Jan 08, 2023
Python virtual filesystem for SQLite to read from and write to S3

Python virtual filesystem for SQLite to read from and write to S3

Department for International Trade 70 Jan 04, 2023
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
A simple library for temporary storage of small files

TemporaryStorage An simple library for temporary storage of small files. Navigation Install Usage In Python console As a standalone application List o

2 Apr 17, 2022
OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

Brian Maloney 100 Dec 13, 2022
Simple, convenient and cross-platform file date changing library. ๐Ÿ“๐Ÿ“…

Simple, convenient and cross-platform file date changing library.

kubinka0505 15 Dec 18, 2022
Some-tasks - Files for some of the tasks for the group sessions

Files for some of the tasks for the group sessions Here you can find some of the

<a href=[email protected] Computer Networks"> 0 Aug 25, 2022
Simple archive format designed for quickly reading some files without extracting the entire archive

Simple archive format designed for quickly reading some files without extracting the entire archive

Jarred Sumner 336 Dec 30, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs ยป ยท Report Bug ยท Request Feature Table of Contents Ab

1 Feb 06, 2022
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 05, 2022
This is a junk file creator tool which creates junk files in Internal Storage

This is a junk file creator tool which creates junk files in Internal Storage

KiLL3R_xRO 3 Jun 20, 2021
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 02, 2023
Get Your TXT File Length !.

TXTLen Get Your TXT File Length !. Hi ๐Ÿ‘‹ , I'm Alireza A Python Developer Boy ๐Ÿ”ญ Iโ€™m currently working on my C# projects ๐ŸŒฑ Iโ€™m currently Learning CSh

Alireza Hasanzadeh 1 Jan 06, 2022
This is just a GUI that detects your file's real extension using the filetype module.

Real-file.extnsn This is just a GUI that detects your file's real extension using the filetype module. Requirements Python 3.4 and above filetype modu

1 Aug 08, 2021
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
Utils for streaming large files (S3, HDFS, gzip, bz2...)

smart_open โ€” utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor

RARE Technologies 2.7k Jan 06, 2023
Uncompress DEFLATE streams in pure Python

stream-inflate Uncompress DEFLATE streams in pure Python. Installation pip install stream-inflate Usage from stream_inflate import stream_inflate impo

Michal Charemza 7 Oct 13, 2022
Publicly Open Amazon AWS S3 Bucket Viewer

S3Viewer Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/) s3viewer is a free tool for security researchers that

Sharon Brizinov 377 Dec 02, 2022