Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
A library from RCTI+ to handle RabbitMQ tasks (connect, send, receive, etc) in Python.

Introduction A library from RCTI+ to handle RabbitMQ tasks (connect, send, receive, etc) in Python. Requirements Python =3.7.3 Pika ==1.2.0 Aio-pika

Dali Kewara 1 Feb 05, 2022
A fancy and practical functional tools

Funcy A collection of fancy functional tools focused on practicality. Inspired by clojure, underscore and my own abstractions. Keep reading to get an

Alexander Schepanovski 2.9k Jan 07, 2023
✨ Un code pour voir les disponibilités des vaccins contre le covid totalement fait en Python par moi, et en français.

Vaccine Notifier ❗ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français. 🔮 Grâce a une requète API, on peu

MrGabin 3 Jun 06, 2021
WindowsDebloat - Windows Debloat with python

Windows Debloat 🗑️ Quickly and easily configure Windows 10. Disclaimer I am NOT

1 Mar 26, 2022
A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Madison May 1.5k Dec 31, 2022
Factoral Methods using two different method

Factoral-Methods-using-two-different-method Here, I am finding the factorial of a number by using two different method. The first method is by using f

Sachin Vinayak Dabhade 4 Sep 24, 2021
A workflow management tool for numerical models on the NCI computing systems

Payu Payu is a climate model workflow management tool for supercomputing environments. Payu is currently only configured for use on computing clusters

The Payu Organization 11 Aug 25, 2022
Hot reloading for Python

Hot reloading for Python

Olivier Breuleux 769 Jan 03, 2023
A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct.

torcp A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct. Usage: python3 torcp.py -h Exa

ccf2012 105 Dec 22, 2022
Grank is a feature-rich script that automatically grinds Dank Memer for you

Grank Inspired by this repository. This is a WIP and there will be more functions added in the future. What is Grank? Grank is a feature-rich script t

42 Jul 20, 2022
A library to easily convert climbing route grades between different grading systems.

pyclimb A library to easily convert climbing route grades between different grading systems. In rock climbing, mountaineering, and other climbing disc

Ilias Antonopoulos 4 Jan 26, 2022
PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.

PyHook is the python implementation of my SharpHook project, It uses various API hooks in order to give us the desired credentials. PyHook Uses

Ilan Kalendarov 158 Dec 22, 2022
Search, generate & deliver Msfvenom payloads in an quick and easy way

Goal Search, generate & deliver payloads in an quick and easy way Be as simple as possible BUT with all msfvenom payloads. Ever lost time searching th

2 Mar 03, 2022
Small Python script to parse endlessh's output and print some neat statistics

endlessh_parser endlessh_parser is a small Python script that parses endlessh's output and prints some neat statistics about it Usage Install all the

ManicRobot 1 Oct 18, 2021
Allows you to canibalize methods from classes effectively implementing trait-oriented programming

About This package enables code reuse in non-inheritance way from existing classes, effectively implementing traits-oriented programming pattern. Stor

1 Dec 13, 2021
Create a Web Component (a Custom Element) from a python file

wyc Create a Web Component (a Custom Element) from a python file (transpile python code to javascript (es2015)). Features Use python to define your cu

7 Oct 09, 2022
Abby's Left Hand Modifiers Dictionary

Abby's Left Hand Modifiers Dictionary Design This dictionary is inspired by and

12 Dec 08, 2022
A functional standard library for Python.

Toolz A set of utility functions for iterators, functions, and dictionaries. See the PyToolz documentation at https://toolz.readthedocs.io LICENSE New

4.1k Dec 30, 2022
Find dependent python scripts of a python script in a project directory.

Find dependent python scripts of a python script in a project directory.

2 Dec 05, 2021
Pampy: The Pattern Matching for Python you always dreamed of.

Pampy: Pattern Matching for Python Pampy is pretty small (150 lines), reasonably fast, and often makes your code more readable and hence easier to rea

Claudio Santini 3.5k Jan 06, 2023