Fixed-size struct serialization, using Python 3.9 annotated type hints
Features:
- One file, zero dependencies
- Easy to use, just annotate your fields and derive from
Struct
- Overridable (just define
__load__
,__save__
,__size__
,__align__
) - Compatible with dataclasses
- Integer / float primitives
- Fixed size arrays
- Raw chunks (
bytes
) - Static checking / size calculation
- Packed or aligned structs, with 3 padding handling modes
This was originally uploaded as a Gist because it's not intended as a serious project, but I decided to take it a bit further and add some features.
from .serialization import *
from io import BytesIO
from dataclasses import dataclass
# C ALIASES (for LP64)
Bool = U8
Char = S8; UChar = U8
Short = S16; UShort = U16
Int = S32; UInt = U32
Long = S64; ULong = U64
IntPtr = S64; UIntPtr = U64
Ptr = Size = U64
# Define some structs
@dataclass
class Foo(Struct):
yeet: Bool
ping: Bool
@dataclass
class MyStruct(Struct):
foo: Foo
bar: UInt
three_bazs: tuple[Long, Long, Long]
# Decode with __load__(), passing an IO
data = bytes.fromhex('01 00 0000 00050000 0100000000000000 0200000000000000 0300000000000000')
parsed = MyStruct.__load__(BytesIO(data))
assert parsed == MyStruct(foo=Foo(yeet=1, ping=0), bar=1280, three_bazs=(1, 2, 3))
# Encode back with __save__()
parsed.__save__(st := BytesIO())
assert data == st.getvalue()
A serializable class is one that implements the FixedSerializable
protocol:
__load__(cls, st: BinaryIO) -> cls
: class method that deserializes a readable binary stream into an instance__save__(self, st: BinaryIO)
: instance method that serializes the instance into a writeable binary stream__size__: int
: class attribute that indicates total serialized size__align__: int
: align factor (1 if no alignment)
__size__
is expected to be a multiple of __align__
i.e. it includes any trailing alignment as needed.
class Color(NamedTuple):
r: int
g: int
b: int
__size__ = 3
__align__ = 1
@classmethod
def __load__(cls, st):
return cls(*st.read(3))
def __save__(self, st):
st.write(bytes(self))
Most times you'll just derive from Struct
, which implements the serializable protocol for you, based on the property annotations of the class. As in C, fields are serialized in order of declaration.
@dataclass
class Color(Struct):
r: U8
g: U8
b: U8
The implemented __load__
will construct an instance of the class passsing in keyword arguments according to the property annotations. In this example, the class will be constructed like Color(r=0, g=21, b=10)
. To avoid implementing the constructor and other methods yourself, Python's dataclass
decorator can be used.
For the annotated properties, the following types are allowed:
- A class implementing the serializable protocol, such as another struct.
- One of the provided integer / float primitives:
U8
,S8
,U16
,S16
,U32
,S32
,U64
,S64
,F32
,F64
. These are aliases ofint
orfloat
, with custom metadata consumed byStruct
. bytes
,bytearray
,list[T]
ortuple[T, ...]
(whereT
is itself an allowed type). However, it must be annotated withFixedSize
metadata like so:Annotated[list[U8], FixedSize(20)]
- A
tuple
with allowed types as elements.
Struct
is a metaclass, so it must be the first parent. Inheritace (subclassing the struct class, or more parents in addition to Struct
) is discouraged and will probably not work correctly.
Struct
can automatically insert padding to align fields according to their __align__
attribute (or for primitives, their size). In this case, the struct itself is aligned to the LCM of the field alignments (and its __size__
is padded accordingly).
The align
metaclass attribute controls how alignment is handled:
discard
(default): When decoding, any bytes are accepted as padding (and discarded). When encoding, zeroes are inserted. (This is the only alignment mode that introduces malleability.)zeros
: Likediscard
, but only zeroes are accepted as padding when decoding.explicit
: Don't actually insert any padding, just check that all fields are aligned and that__size__
is aligned too. This mode expects you to explicitly declare padding as (for example)bytes
.no
: No alignment at all. Field alignments are ignored and__align__
is set to 1. This is equivalent to a packed / unaligned struct.
For example, to create a packed struct:
@dataclass
class Address(Struct, align='no'):
port: U16
# without align='no', 2 bytes of padding would be inserted here
host: U32
Caveat: tuple
(last case in allowed types) will not verify or insert alignment between its elements; its size will be the sum of the field sizes. Its alignment will be gcd(__size__, lcm(field alignments...))
, which means it'll have the right alignment if its elements are aligned. If you need proper alignment handling, use a nested Struct instead of a tuple.
Note: align
must be specified on each class; it doesn't get inherited from the struct that used it.
Note: You can override the __align__
field calculated by Struct
, just make sure it's a divisor of the resulting __size__
:
@dataclass
class Address(Struct, align='no'):
__align__ = 2 # would be 1 by default, since we're using align='no'
port: U16
host: U32
Serialization is non-malleable (that is, there's a bijection between serialized and unserialized values) if all of the following conditions are met:
- Structs use an alignment setting other than the default
align='discard'
.
Of course, if serialization is implemented manually at some point, malleability has to be checked there as well.
- Bit fields
- Post validation / transform (enums, booleans, string buffers, sets)
- Endianness control
- At annotation time, or at runtime?
- Unions (? not clear how I'd implement those)
- Optimization and laziness
Higher level:
- Pointer newtype / wrapper class
- Ideally annotated with target hint
- Call to dereference, optionally passing index
- Generics in struct classes