Simple spill-to-disk dictionary

Related tags

Data Structureschest
Overview

Chest

Build Status Coverage Status Version Status Downloads

A dictionary that spills to disk.

Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occasions:

  1. Chest can hold datasets that are larger than memory
  2. Chest persists and so can be saved and loaded for later use

Related Projects

The standard library shelve is an alternative out-of-core dictionary. Chest offers the following benefits over shelve:

  1. Chest supports any hashable key (not just strings)
  2. Chest supports pluggable serialization and file saving schemes

Alternatively one might consider a traditional key-value store database like Redis.

Shove is another excellent alternative with support for a variety of stores.

How it works

Chest stores data in two locations

  1. An in-memory dictionary
  2. On the filesystem in a directory owned by the chest

As a user adds contents to the chest the in-memory dictionary fills up. When a chest stores more data in memory than desired (see available_memory= keyword argument) it writes the larger contents of the chest to disk as pickle files (the choice of pickle is configurable). When a user asks for a value chest checks the in-memory store, then checks on-disk and loads the value into memory if necessary, pushing other values to disk.

Chest is a simple project. It was intended to provide a simple interface to assist in the storage and retrieval of numpy arrays. However it's design and implementation are agnostic to this case and so could be used in a variety of other situations.

With minimal work chest could be extended to serve as a communication point between multiple processes.

Known Failings

Chest was designed to hold a moderate amount of largish numpy arrays. It doesn't handle the very many small key-value pairs usecase (though could with small effort). In particular chest has the following deficiencies

  1. Chest is not multi-process safe. We should institute a file lock at least around the .keys file.
  2. Chest does not support mutation of variables on disk.

LICENSE

New BSD. See License

Install

chest is available through conda:

conda install chest

chest is on the Python Package Index (PyPI):

pip install chest

Example

>>> from chest import Chest
>>> c = Chest()

>>> # Acts like a normal dictionary
>>> c['x'] = [1, 2, 3]
>>> c['x']
[1, 2, 3]

>>> # Data persists to local files
>>> c.flush()
>>> import os
>>> os.listdir(c.path)
['.keys', 'x']

>>> # These files hold pickled results
>>> import pickle
>>> pickle.load(open(c.key_to_filename('x')))
[1, 2, 3]

>>> # Though one normally accesses these files with chest itself
>>> c2 = Chest(path=c.path)
>>> c2.keys()
['x']
>>> c2['x']
[1, 2, 3]

>>> # Chest is configurable, so one can use json, etc. instead of pickle
>>> import json
>>> c = Chest(path='my-chest', dump=json.dump, load=json.load)
>>> c['x'] = [1, 2, 3]
>>> c.flush()

>>> json.load(open(c.key_to_filename('x')))
[1, 2, 3]

Dependencies

Chest supports Python 2.6+ and Python 3.2+ with a common codebase.

It currently depends on the heapdict library.

It is a light weight dependency.

Owner
Blaze
Blaze
Simple spill-to-disk dictionary

Chest A dictionary that spills to disk. Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occas

Blaze 59 Dec 19, 2022
Data Structures and algorithms package implementation

Documentation Simple and Easy Package --This is package for enabling basic linear and non-linear data structures and algos-- Data Structures Array Sta

1 Oct 30, 2021
dict subclass with keylist/keypath support, normalized I/O operations (base64, csv, ini, json, pickle, plist, query-string, toml, xml, yaml) and many utilities.

python-benedict python-benedict is a dict subclass with keylist/keypath support, I/O shortcuts (base64, csv, ini, json, pickle, plist, query-string, t

Fabio Caccamo 799 Jan 09, 2023
Decided to include my solutions for leetcode problems.

LeetCode_Solutions Decided to include my solutions for leetcode problems. LeetCode # 1 TwoSum First leetcode problem and it was kind of a struggle. Th

DandaIT04 0 Jan 01, 2022
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions

Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut

Quan Le 8 Oct 04, 2022
A mutable set that remembers the order of its entries. One of Python's missing data types.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Elia Robyn Lake (Robyn Speer) 173 Nov 28, 2022
RLStructures is a library to facilitate the implementation of new reinforcement learning algorithms.

RLStructures is a lightweight Python library that provides simple APIs as well as data structures that make as few assumptions as possibl

Facebook Research 262 Nov 18, 2022
This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

DSA-Code-Snippet This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages Cont

DSCSRMNCR 3 Oct 22, 2021
Datastructures such as linked list, trees, graphs etc

datastructures datastructures such as linked list, trees, graphs etc Made a public repository for coding enthusiasts. Those who want to collaborate on

0 Dec 01, 2021
🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

py-struct Fixed-size struct serialization, using Python 3.9 annotated type hints This was originally uploaded as a Gist because it's not intended as a

Alba Mendez 4 Jan 14, 2022
Multidict is dict-like collection of key-value pairs where key might be occurred more than once in the container.

multidict Multidict is dict-like collection of key-value pairs where key might be occurred more than once in the container. Introduction HTTP Headers

aio-libs 325 Dec 27, 2022
A Munch is a Python dictionary that provides attribute-style access (a la JavaScript objects).

munch munch is a fork of David Schoonover's Bunch package, providing similar functionality. 99% of the work was done by him, and the fork was made mai

Infinidat Ltd. 643 Jan 07, 2023
Python collections that are backended by sqlite3 DB and are compatible with the built-in collections

sqlitecollections Python collections that are backended by sqlite3 DB and are compatible with the built-in collections Installation $ pip install git+

Takeshi OSOEKAWA 11 Feb 03, 2022
Persistent dict, backed by sqlite3 and pickle, multithread-safe.

sqlitedict -- persistent dict, backed-up by SQLite and pickle A lightweight wrapper around Python's sqlite3 database with a simple, Pythonic dict-like

RARE Technologies 954 Dec 23, 2022
One-Stop Destination for codes of all Data Structures & Algorithms

CodingSimplified_GK This repository is aimed at creating a One stop Destination of codes of all Data structures and Algorithms along with basic explai

Geetika Kaushik 21 Sep 26, 2022
My solutions to the competitive programming problems on LeetCode, USACO, LintCode, etc.

This repository holds my solutions to the competitive programming problems on LeetCode, USACO, LintCode, CCC, UVa, SPOJ, and Codeforces. The LeetCode

Yu Shen 32 Sep 17, 2022
IADS 2021-22 Algorithm and Data structure collection

A collection of algorithms and datastructures introduced during UoE's Introduction to Datastructures and Algorithms class.

Artemis Livingstone 20 Nov 07, 2022
schemasheets - structuring your data using spreadsheets

schemasheets - structuring your data using spreadsheets Create a data dictionary / schema for your data using simple spreadsheets - no coding required

Linked data Modeling Language 23 Dec 01, 2022
A DSA repository but everything is in python.

DSA Status Contents A: Mathematics B: Bit Magic C: Recursion D: Arrays E: Searching F: Sorting G: Matrix H: Hashing I: String J: Linked List K: Stack

Shubhashish Dixit 63 Dec 23, 2022
nocasedict - A case-insensitive ordered dictionary for Python

nocasedict - A case-insensitive ordered dictionary for Python Overview Class NocaseDict is a case-insensitive ordered dictionary that preserves the or

PyWBEM Projects 2 Dec 12, 2021