Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
async parser for JET

This project is mainly aims to provide an async parsing option for NTDS.dit database file for obtaining user secrets.

15 Mar 08, 2022
Collection of code auto-generation utility scripts for the Horizon `Boot` system module

boot-scripts This is a collection of code auto-generation utility scripts for the Horizon Boot system module, intended for use in Atmosphère. Usage Us

4 Oct 11, 2022
glip is a module for retrieve ip address like local-ip, global-ip, external-ip as string.

gle_ip_info glip is a module for retrieve ip address like local-ip, global-ip, external-ip as string.

Fatin Shadab 3 Nov 21, 2021
one_click_kag_server is a program which tries to fully automate the creation of a King Arthur's Gold server.

one_click_kag_server is a program which tries to fully automate the creation of a King Arthur's Gold server.

Benjamin Gorman 4 Jan 05, 2022
We provide useful util functions. When adding a util function, please add a description of the util function.

Utils Collection Motivation When we implement codes, we often search for util functions that are already implemented. Here, we are going to share util

6 Sep 09, 2021
Know your customer pipeline in apache air flow

KYC_pipline Know your customer pipeline in apache air flow For a successful pipeline run take these steps: Run you Airflow server Admin - connection

saeed 4 Aug 01, 2022
EVE-NG tools, A Utility to make operations with EVE-NG more friendly.

EVE-NG tools, A Utility to make operations with EVE-NG more friendly. Also it support different snapshot operations with same style as Libvirt/KVM

Bassem Aly 8 Jan 05, 2023
A simple example for calling C++ functions in Python by `ctypes`.

ctypes-example A simple example for calling C++ functions in Python by ctypes. Features call C++ function int bar(int* value, char* msg) with argumene

Yusu Pan 3 Nov 23, 2022
JavaScript to Python Translator & JavaScript interpreter written in 100% pure Python🚀

Pure Python JavaScript Translator/Interpreter Everything is done in 100% pure Python so it's extremely easy to install and use. Supports Python 2 & 3.

Piotr Dabkowski 2.1k Dec 30, 2022
Entropy-controlled contexts in Python

Python module ordered ordered module is the opposite to random - it maintains order in the program. import random x = 5 def increase(): global x

HyperC 36 Nov 03, 2022
This repository contains scripts that help you validate QR codes.

Validation tools This repository contains scripts that help you validate QR codes. It's hacky, and a warning for Apple Silicon users: the dependencies

Ryan Barrett 8 Mar 01, 2022
Nmap script to guess* a GitLab version.

gitlab-version-nse Nmap script to guess* a GitLab version. Usage https://github.com/righel/gitlab-version-nse cd gitlab-version-nse nmap target --s

Luciano Righetti 120 Dec 05, 2022
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Jon Crall 638 Dec 13, 2022
Compute the fair market value (FMV) of staking rewards at time of receipt.

tendermint-tax A tool to help calculate the tax liability of staking rewards on Tendermint chains. Specifically, this tool calculates the fair market

5 Jan 07, 2022
This is Cool Utility tools that you can use in python.

This is Cool Utility tools that you can use in python. There are a few tools that you might find very useful, you can use this on pretty much any project and some utils might help you a lot and save

Senarc Studios 6 Apr 18, 2022
Simple integer-valued time series bit packing

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit

Ghiles Meddour 7 Aug 27, 2021
A quick username checker to see if a username is available on a list of assorted websites.

A quick username checker to see if a username is available on a list of assorted websites.

Maddie 4 Jan 04, 2022
PyResToolbox - A collection of Reservoir Engineering Utilities

pyrestoolbox A collection of Reservoir Engineering Utilities This set of functio

Mark W. Burgoyne 39 Oct 17, 2022
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark - Parsing Library & Toolkit 3.5k Jan 05, 2023
Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool helps is to quickly investigate a huge set of onion sites based by checking its availability which helps to filter out the inactive sites and collect the site title that might helps us to c

Balaji 13 Nov 25, 2022