🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

Overview

py-struct

Fixed-size struct serialization, using Python 3.9 annotated type hints

This was originally uploaded as a Gist because it's not intended as a serious project, but I decided to take it a bit further and add some features.

Features:

  • One file, zero dependencies
  • Easy to use, just annotate your fields and use the decorator
  • Overridable (just define __load__, __save__, __size__, __align__)
  • Compatible with dataclasses
  • Integer / float primitives
  • Fixed size arrays
  • Raw chunks (bytes)
  • Static checking / size calculation
  • Packed or aligned structs, with 3 padding handling modes

Getting started

from .serialization import *
from io import BytesIO
from dataclasses import dataclass

# C ALIASES (for LP64)

Bool = U8
Char = S8; UChar = U8
Short = S16; UShort = U16
Int = S32; UInt = U32
Long = S64; ULong = U64
IntPtr = S64; UIntPtr = U64
Ptr = Size = U64

# Define some structs

@dataclass
class Foo(Struct):
    yeet: Bool
    ping: Bool

@dataclass
class MyStruct(Struct):
    foo: Foo
    bar: UInt
    three_bazs: tuple[Long, Long, Long]

# Decode with __load__(), passing an IO

data = b'\x01\x00\x00\x00\x00\x05\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
parsed = MyStruct.__load__(BytesIO(data))

assert parsed == MyStruct(foo=Foo(yeet=1, ping=0), bar=1280, three_bazs=(1, 2, 3))

# Encode back with __save__()

parsed.__save__(st := BytesIO())
assert data == st.getvalue()

Usage

Serializable protocol

A serializable class is one that implements the FixedSerializable protocol:

  • __load__(cls, st: BinaryIO) -> cls: class method that deserializes a readable binary stream into an instance
  • __save__(self, st: BinaryIO): instance method that serializes the instance into a writeable binary stream
  • __size__: int: class attribute that indicates total serialized size
  • __align__: int: align factor (1 if no alignment)

__size__ is expected to be a multiple of __align__ i.e. it includes any trailing alignment as needed.

class Color(NamedTuple):
    r: int
    g: int
    b: int

    __size__ = 3
    __align__ = 1
    @classmethod
    def __load__(cls, st):
        return cls(*st.read(3))
    def __save__(self, st):
        st.write(bytes(self))

Structs

Most times you'll just derive from Struct, which implements the serializable protocol for you, based on the property annotations of the class. As in C, fields are serialized in order of declaration.

@dataclass
class Color(Struct):
    r: U8
    g: U8
    b: U8

The implemented __load__ will construct an instance of the class passsing in keyword arguments according to the property annotations. In this example, the class will be constructed like Color(r=0, g=21, b=10). To avoid implementing the constructor and other methods yourself, Python's dataclass decorator can be used.

For the annotated properties, the following types are allowed:

  • A class implementing the serializable protocol, such as another struct.
  • One of the provided integer / float primitives: U8, S8, U16, S16, U32, S32, U64, S64, F32, F64. These are aliases of int with custom metadata consumed by Struct.
  • bytes, list[T] or tuple[T, ...] (where T is itself an allowed type).
  • A tuple with allowed types as elements. However, it must be annotated with FixedSize metadata like so: Annotated[list[U8], FixedSize(20)]

Struct is a metaclass, so it must be the first parent. Inheritace (subclassing the struct class, or more parents in addition to Struct) is discouraged and will probably not work correctly.

Alignment

Struct can automatically insert padding to align fields according to their __align__ attribute (or for primitives, their size). In this case, the struct itself is aligned to the LCM of the field alignments (and its __size__ is padded accordingly).

The align metaclass attribute controls how alignment is handled:

  • discard (default): When decoding, any bytes are accepted as padding (and discarded). When encoding, zeroes are inserted. (This is the only alignment mode that introduces malleability.)
  • zeros: Like discard, but only zeroes are accepted as padding when decoding.
  • explicit: Don't actually insert any padding, just check that all fields are aligned and that __size__ is aligned too. This mode expects you to explicitly declare padding as (for example) bytes.
  • no: No alignment at all. Field alignments are ignored and __align__ is set to 1. This is equivalent to a packed / unaligned struct.

For example, to create a packed struct:

@dataclass
class Address(Struct, align='no'):
    port: U16
    # without align='no', 2 bytes of padding would be inserted here
    host: U32

Caveat: tuple (last case in allowed types) will not verify or insert alignment between its elements. Its alignment will be the GCD of the alignments of its elements, and the size will be the sum of the sizes. This means tuple[U64, U64] will probably do what you want (align to 8 bytes), but tuple[U64, U32] will only align to 4 bytes. If you need alignment, use a nested Struct instead of a tuple.

Malleability

Serialization is non-malleable (that is, there's a bijection between serialized and unserialized values) if all the following conditions are met:

  • Structs use an alignment setting other than the default align='discard'.

Of course, if serialization is implemented manually at some point, malleability has to be checked there as well.

Wishlist

  • Bit fields
  • Post validation / transform (enums, booleans, string buffers, sets)
  • Endianness control
    • At annotation time, or at runtime?
  • Unions (? not clear how I'd implement those)
  • Optimization and laziness

Higher level:

  • Pointer newtype / wrapper class
    • Ideally annotated with target hint
    • Call to dereference, optionally passing index
  • Generics in struct classes
Owner
Alba Mendez
hi i'm a lesbian catgirl who loves hummus and kernels 🌸 she/her
Alba Mendez
A simple tutorial to use tree-sitter to parse code into ASTs

A simple tutorial to use py-tree-sitter to parse code into ASTs. To understand what is tree-sitter, see https://github.com/tree-sitter/tree-sitter. Tr

Nghi D. Q. Bui 7 Sep 17, 2022
A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.

A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.

Peter F 93 Dec 01, 2022
Python tree data library

Links Documentation PyPI GitHub Changelog Issues Contributors If you enjoy anytree Getting started Usage is simple. Construction from anytree impo

776 Dec 28, 2022
Python collections that are backended by sqlite3 DB and are compatible with the built-in collections

sqlitecollections Python collections that are backended by sqlite3 DB and are compatible with the built-in collections Installation $ pip install git+

Takeshi OSOEKAWA 11 Feb 03, 2022
Final Project for Practical Python Programming and Algorithms for Data Analysis

Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan

Aislyn Schalck 1 Jan 27, 2022
A high-performance immutable mapping type for Python.

immutables An immutable mapping type for Python. The underlying datastructure is a Hash Array Mapped Trie (HAMT) used in Clojure, Scala, Haskell, and

magicstack 996 Jan 02, 2023
This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will

SAMIR PAUL 1.3k Jan 01, 2023
A Python library for electronic structure pre/post-processing

PyProcar PyProcar is a robust, open-source Python library used for pre- and post-processing of the electronic structure data coming from DFT calculati

Romero Group 124 Dec 07, 2022
This repo represents all we learned and are learning in Data Structure course.

DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by

Aprime Afr (Alireza Afroozi) 3 Jan 22, 2022
Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures

Svector Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures. Easily chain your methods confidently with tons of add

James Chua 5 Dec 09, 2022
Integrating C Buffer Data Into the instruction of `.text` segment instead of on `.data`, `.rodata` to avoid copy.

gcc-bufdata-integrating2text Integrating C Buffer Data Into the instruction of .text segment instead of on .data, .rodata to avoid copy. Usage In your

Jack Ren 1 Jan 31, 2022
An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

Travis Hoppe 72 Jan 01, 2023
Al-Quran dengan Terjemahan Indonesia

Al-Quran Rofi Al-Quran dengan Terjemahan / Tafsir Jalalayn Instalasi Al-Quran Rofi untuk Archlinux untuk pengguna distro Archlinux dengan paket manage

Nestero 4 Dec 20, 2021
This repository contains code for CTF platform.

CTF-platform Repository for backend of CTF hosting website For starting the project first time : Clone the repo in which you have to work in your syst

Yash Jain 3 Feb 18, 2022
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions

Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut

Quan Le 8 Oct 04, 2022
Data Structures and algorithms package implementation

Documentation Simple and Easy Package --This is package for enabling basic linear and non-linear data structures and algos-- Data Structures Array Sta

1 Oct 30, 2021
Chemical Structure Generator

CSG: Chemical Structure Generator A simple Chemical Structure Generator. Requirements Python 3 (= v3.8) PyQt5 (optional; = v5.15.0 required for grap

JP&K 5 Oct 22, 2022
Webtesting for course Data Structures & Algorithms

Selenium job to automate queries to check last posts of Module Data Structures & Algorithms Web-testing for course Data Structures & Algorithms Struct

1 Dec 15, 2021
Data Structure With Python

Data-Structure-With-Python- Python programs also include in this repo Stack A stack is a linear data structure that stores items in a Last-In/First-Ou

Sumit Nautiyal 2 Jan 09, 2022
Basic sort and search algorithms written in python.

Basic sort and search algorithms written in python. These were all developed as part of my Computer Science course to demonstrate understanding so they aren't 100% efficent

Ben Jones 0 Dec 14, 2022