A HDF5-based python pickle replacement

Last update: Dec 21, 2022

Related tags

Overview

Hickle

Hickle is an HDF5 based clone of pickle, with a twist: instead of serializing to a pickle file, Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of h5py and dill/pickle with extended functionality.

That is: hickle is a neat little way of dumping python variables to HDF5 files that can be read in most programming languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP).

Why use Hickle?

While hickle is designed to be a drop-in replacement for pickle (or something like json), it works very differently. Instead of serializing / json-izing, it instead stores the data using the excellent h5py module.

The main reasons to use hickle are:

It's faster than pickle and cPickle.
It stores data in HDF5.
You can easily compress your data.

The main reasons not to use hickle are:

You don't want to store your data in HDF5. While hickle can serialize arbitrary python objects, this functionality is provided only for convenience, and you're probably better off just using the pickle module.
You want to convert your data in human-readable JSON/YAML, in which case, you should do that instead.

So, if you want your data in HDF5, or if your pickling is taking too long, give hickle a try. Hickle is particularly good at storing large numpy arrays, thanks to h5py running under the hood.

Documentation

Documentation for hickle can be found at telegraphic.github.io/hickle/.

Usage example

Hickle is nice and easy to use, and should look very familiar to those of you who have pickled before.

In short, hickle provides two methods: a hickle.load method, for loading hickle files, and a hickle.dump method, for dumping data into HDF5. Here's a complete example:

import os
import hickle as hkl
import numpy as np

# Create a numpy array of data
array_obj = np.ones(32768, dtype='float32')

# Dump to file
hkl.dump(array_obj, 'test.hkl', mode='w')

# Dump data, with compression
hkl.dump(array_obj, 'test_gzip.hkl', mode='w', compression='gzip')

# Compare filesizes
print('uncompressed: %i bytes' % os.path.getsize('test.hkl'))
print('compressed:   %i bytes' % os.path.getsize('test_gzip.hkl'))

# Load data
array_hkl = hkl.load('test_gzip.hkl')

# Check the two are the same file
assert array_hkl.dtype == array_obj.dtype
assert np.all((array_hkl, array_obj))

HDF5 compression options

A major benefit of hickle over pickle is that it allows fancy HDF5 features to be applied, by passing on keyword arguments on to h5py. So, you can do things like:

hkl.dump(array_obj, 'test_lzf.hkl', mode='w', compression='lzf', scaleoffset=0,
         chunks=(100, 100), shuffle=True, fletcher32=True)

A detailed explanation of these keywords is given at http://docs.h5py.org/en/latest/high/dataset.html, but we give a quick rundown below.

In HDF5, datasets are stored as B-trees, a tree data structure that has speed benefits over contiguous blocks of data. In the B-tree, data are split into chunks, which is leveraged to allow dataset resizing and compression via filter pipelines. Filters such as shuffle and scaleoffset move your data around to improve compression ratios, and fletcher32 computes a checksum. These file-level options are abstracted away from the data model.

Recent changes

June 2020: Major refactor to version 4, and removal of support for Python 2.
December 2018: Accepted to Journal of Open-Source Software (JOSS).
June 2018: Major refactor and support for Python 3.
Aug 2016: Added support for scipy sparse matrices bsr_matrix, csr_matrix and csc_matrix.

Performance comparison

Hickle runs a lot faster than pickle with its default settings, and a little faster than pickle with protocol=2 set:

CPU times: user 764 us, sys: 35.6 ms, total: 36.4 ms Wall time: 36.2 ms">

In [1]: import numpy as np

In [2]: x = np.random.random((2000, 2000))

In [3]: import pickle

In [4]: f = open('foo.pkl', 'w')

In [5]: %time pickle.dump(x, f)  # slow by default
CPU times: user 2 s, sys: 274 ms, total: 2.27 s
Wall time: 2.74 s

In [6]: f = open('foo.pkl', 'w')

In [7]: %time pickle.dump(x, f, protocol=2)  # actually very fast
CPU times: user 18.8 ms, sys: 36 ms, total: 54.8 ms
Wall time: 55.6 ms

In [8]: import hickle

In [9]: f = open('foo.hkl', 'w')

In [10]: %time hickle.dump(x, f)  # a bit faster
dumping <type 'numpy.ndarray'> to file <HDF5 file "foo.hkl" (mode r+)>
CPU times: user 764 us, sys: 35.6 ms, total: 36.4 ms
Wall time: 36.2 ms

So if you do continue to use pickle, add the protocol=2 keyword (thanks @mrocklin for pointing this out).

For storing python dictionaries of lists, hickle beats the python json encoder, but is slower than uJson. For a dictionary with 64 entries, each containing a 4096 length list of random numbers, the times are:

json took 2633.263 ms
uJson took 138.482 ms
hickle took 232.181 ms

It should be noted that these comparisons are of course not fair: storing in HDF5 will not help you convert something into JSON, nor will it help you serialize a string. But for quick storage of the contents of a python variable, it's a pretty good option.

Installation guidelines

Easy method

Install with pip by running pip install hickle from the command line.

Manual install

You should have Python 3.5 and above installed
Install h5py (Official page: http://docs.h5py.org/en/latest/build.html)
Install hdf5 (Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL)
Download hickle: via terminal: git clone https://github.com/telegraphic/hickle.git via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find Download ZIP file
cd to your downloaded hickle directory
Then run the following command in the hickle directory: python setup.py install

Testing

Once installed from source, run python setup.py test to check it's all working.

Bugs & contributing

Contributions and bugfixes are very welcome. Please check out our contribution guidelines for more details on how to contribute to development.

Referencing hickle

If you use hickle in academic research, we would be grateful if you could reference our paper in the Journal of Open-Source Software (JOSS).

Price et al., (2018). Hickle: A HDF5-based python pickle replacement. Journal of Open Source Software, 3(32), 1115, https://doi.org/10.21105/joss.01115

Comments

Hickle 5 rc

Ok here we go as agreed discussing pr #138 here the fully assembled hickle-5-RC branch. In case you prefer to directly review within my repo feel free to close this pr again. I'm fine with whatever you prefer. Not sure if all commits reported below are really visible from within this branch or whether the log below is an exceprt of the git reflog.

I also expcet that appveyor and travis to complain a bit due to the tox related changes would be surprised if things would work on the first try. Anyway i cross fingers

Reason for python 3.5 failure known (cant test here any more lacking Python3.5). Fixing if support of Python 3.5 should be kept beyond hickle 4. Is easy just one change in one line.

Fixing fail in Pyhton 3.5 is one change in one line will do if support for Python3.5, which i have no means any more to test locally here) shall be supported beyond hickle 4. No problem at all.

With the astropy warnings i would need your help as we do not use Astropy here so i have not any clue how to fix.

opened by hernot 54
support for python copy protocol __setstate__ __getstate__ if present in object
[Suggestion]

I do have serveral complex classes which in order to be picled by different pickle replacements like jsonpickle and others implement __getstate__ and __setstate__ methods. Besides beeing copyable for free using copy.copy and copy.deepcopy pickling is quite straight forward.

import numpy as np class with_state(): def __init__(self): self.a=12 sef.b = { 'love':np.ones([12,7]),'hatred':np.zeros([4,9]) } def __getstate__(self): return dict( a=self.a b = self.b ) def __setstate__(self,state): self.a=state['a'] self.b=state['b'] def __getitem__(self,index): if index == 0 : return self.a if index < 2: return b['hatred'] if index > 2: raise ValueError("index unknown") return b['love']

The above example is very simplified removing anyhting unnecessary. Currently these classes are hickled with the warning that object is not understood as a test whether __setstate__ __getstate__ is implemented is missing. Admitted both are handled by pickle fallback but class ends up as string instead of dataset making it quite tedious to extract from hdf5 file on non Python end like c# or other languages.

Therefore i do suggest to add a test for both methods defined and storing class as class state dictionary instead of pickled string. Would need some flag or other means which allows indicating that dict represents result of <class>.__getstate__ and not a plain python dictionary. Test should be run after testing for numpy data and before testing for python iterables as the above class appears to be iterable but isn't.

ADDENDUM: If somebody guides met through i would try the attempt to add appropriate test and conversion function. But I would need atleast guidance which existing methods would be best suitable for template and inspiration and what parts and sections to carefully read from h5py manual, hdf5 spec and other contribute to hickle documentation.
bug
opened by hernot 37
Implementaion of Container and mixed loaders (H4EP001)

At first:

@1313e with this pull request i want to express how much i apriciate your really great work done for hickle 4.0.0 implementing the first step to dedicated loaders.

Second: the reason why I'm so pushing upon implementation of H4EP001

The research conducted by the research group I'm establishing and leading is split into two tracks. A methodological one dealing with improvement and development of new algorithms and methods for clinical procedures in diagnostics and treatment. The second one is concerned with clinical research utilizing the tools based upon the methods and algorithms provided by the first track.

In the first track python, numpy, scipy etc. are the primary tools for working on the algorithms, investigating new procedures and algorithmic approaches. The work in the second track is primarily conducted by clinicians. Therefore the tools provided for their research and studies have to be thoroughly tested and validated. This validation at least the part which can be automatized through unit test utilizes test data, including intermediate data and results obtained and provided by the python programs and scripts developed during development of underlying algorithm.

As the clinical tools are implemented in compiled languages which support true multi-threading the data passed on has to be stored in a file format readable outside python, out-ruling pickle strings. Therefore jsonpickle was used to dump the data. Meanwhile the amount of data has grown into the large so that json files even if compressed using zip, gzip or other compression schemes is not feasible any more. NPY, and NPZ files which was the next choice mandate a dependency upon numpy library. Just for conducting unit tests a self contained file format for which only the corresponding library without any further has to be included would be the better choice.

And this is the point where hdf5libraries and hickle come into play. I do consider both as the best and most suitable option i have found so far. And the current limitation that objects without dedicated loader are stored as pickle strings can be solved by supporting python copy protocol. Which i offer hereby to contribute to hickle.

Third content of this pull-request:

Implementation of Container based and mixed loaders as proposed by #135 hickle extension proposal H4EP001. For details see commit-message and the proposal #135.

Finally i do recommend:

Not to put this into an official release. Some extended tests using a real dataset compiled for testing and validating software tools and components developed for use in clinical track showed that an important part is missing to keep file sizes at reasonable level. Especially type-strings and pickle strings for class and function objects currently take-up most of the file space letting dumped files grow quickly into GB even with hdf5 file compression activated where the pickle stream just requires 400MB of space. Therefore i do recommend to implement additionally memoization (H4EP002 #139 ) first before considering the resulting code base ready for release.

Ps.: loading of hickle 4.0.0 files should be still possible out of the box. Due to the lack of an appropirate testfile no test is included to verify.

opened by hernot 34
Hickle subclasses

This PR adds support for hickling objects that are subinstances of classes that hickle supports. So, for example, anything that subclasses a dict can now be hickled properly as well. This PR also includes the changes made in #109. Finally, I have removed all instances where the type of a hickled dataset was saved as a list instead of just a normal string.

As it is required to save a bit more data in the HDF5-file now, previously made hickle-files are not supported.

opened by 1313e 29
Several improvements concerning dicts, passing open HDF5-files and more
This PR contains the following changes:

Improved and simplified the imports;

Made sure that all required future imports are done in hickle.py;

Merged many checks for Python2/Python3 into the same check, decreasing the need to duplicate certain statements;

Empty dicts can now be properly hickled and unhickled ( #91 );

Dicts using tuple keys can now be properly hickled and unhickled ( #91 );

Dicts using both integers/floats and integer/float strings as dict keys (e.g., 1 and '1') can now be properly hickled and unhickled;

Passing an open h5py.File object to hickle.dump and hickle.load will no longer automatically close the file ( #92 );

If an Exception is raised in hickle.dump or hickle.load, an opened HDF5-file is always closed, unless it was provided as open to these functions (that is the user's task obviously) ( #90 );

The working directory for doing tests is now automatically set to the system's temporary directory;

Removed all os.remove uses in the tests, as pytest or your own machine automatically performs clean-ups in the temporary directory;

Fixed a problem with the dtype of a dict not always being properly saved ( #91 ).

I have tried to remain as true as possible to the original style of coding (except I use parenthesis for return, since I hate using it as a statement). All changes should be backwards compatible, except for hickled dicts that had their dtype saved incorrectly.
opened by 1313e 18
Huge dict() object loads failed

Hi,

I have hug dict() to save, about 430~MB, after saved it by hickle.

Loads failed: File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\hickle\hickle.py", line 531, in load py_container = _load(py_container, h_root_group) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\hickle\hickle.py", line 601, in _load py_subcontainer = _load(py_subcontainer, h_node) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\hickle\hickle.py", line 608, in _load subdata = load_dataset(h_group) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\hickle\hickle.py", line 561, in load_dataset return load_fn(h_node) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\hickle\loaders\load_python3.py", line 157, in load_pickled_data return pickle.loads(data[0]) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\dill_dill.py", line 275, in loads return load(file, ignore, **kwds) File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\dill_dill.py", line 270, in load return Unpickler(file, ignore=ignore, **kwds).load() File "D:\Users\Cidge\Anaconda3\envs\research_env_final\lib\site-packages\dill_dill.py", line 472, in load obj = StockUnpickler.load(self) _pickle.UnpicklingError: pickle data was truncated

Does hickle have any memory limitations like pickle?

opened by Larryliu912 15
HEP005: Add support for hdf5plugin compression filters (e.g. bitshuffle+lz4)
Comparison of bytesize and elapsed times for a hickle operating on this object: np.random.rand(64, 2, 1048576)

uncompressed ......... 1073.8 MiB, E.T. = 0.5 s gzip-compressed ...... 1013.1 MiB, E.T. = 32.7 s bitshuffle-compressed 937.4 MiB, E.T. = 1.3 s

For smaller objects, it won't matter much. The value of this proposed feature depends on how large of an object that hickle operates on.

Source code: try_deflation.py.txt

(same bitshuffle+lz4 that's used in blimpy and rawspec)
enhancement pull-request-welcome
opened by texadactyl 12

Hickle not working with h5py 3.0

Hickle seems to stop working with h5py 3.0.0. It works fine with 2.10.

Example code:

import hickle as hkl

hkl.dump([1,2,3], "/tmp/foo.hkl")
hkl.load("/tmp/foo.hkl")

Fails with

ValueError: Provided argument 'file_obj' does not appear to be a valid hickle file! (Cannot load <HDF5 dataset "data": shape (3,), type "<i8"> data type)

opened by cyfra 12

Release of v4.0.0?
Also create this PR in advance, such that we can keep track of what issues have already been dealt with. This includes changes made in any (non-merged) PR to the dev branch.

Issues solved:

OrderedDict is now supported (fixes #65);

data_0 is no longer used if there is only a single data group/set (fixes #44);

HDF5 groups can now be dumped to and loaded from (fixes #54);

Integers using Python's arbitrary-precision (integers larger than 64-bit) can now be dumped and loaded properly (fixes #113);

Replaced broken link to pickle documentation with proper one (fixes #122);

Objects that appear to be iterable are no longer considered as such unless hickle knows for sure they are iterables (fixes #70 and fixes #125);

Dict keys with slashes are now supported (fixes #124);

Loaders are only loaded when they are required for dumping or loading specific objects (fixes #114);

NOTE: Before this gets merged, a v3 branch must be made of master.
opened by 1313e 12
[FEATURE] Providing an open HDF5-file to dump/load does not close the file

Something that may be useful is that when the user passes an open HDF5-file to the hickle.dump or hickle.load functions, they do not automatically close the HDF5-file afterwards. Given that the user provided them as open, I think it would be pretty safe to assume that it will close the file itself.

Reason why I think this is useful is because I am hickling many objects to the exact same file, while I am controlling the paths within the file. Currently, this means that the file is being opened and closed for every dump or load that I perform, while they are all executed after each other. This is a bit stupid, as it creates both overhead and a lot of strain on the file system (obviously depending on how many times the file is opened and closed).

Doing this however would change the comment I made in #90 about using with-statements, as the h5f.close should only be called if hickle opened the file itself (this can be done quite easily by using finally-clauses). If you want, I can make the changes myself (including addressing the closing issue of #90) and simply create a PR.
enhancement pull-request-welcome

opened by 1313e 12
Hickling an empty dict does not return one when unhickled

Not sure if this is intended behavior, but when hickling a dict that is empty and unhickling it, an empty list is returned instead of an empty dict. Hickling and unhickling a non-empty dict will work perfectly fine. Is this intended behavior or simply something that was overlooked?
bug

opened by 1313e 12
Fix for failing tests with numpy 1.24.1

numpy has removed numpy.float(), the solution is to use a regular float() instead

Debian bug report here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027194

opened by EdwardBetts 1
hickleable integration

@steven-murray has written a neat package called hickleable. The idea is to provide a simple decorator to apply to classes that will make them hickle-able without pickling.

Is this better as a standalone package, or could we consider merging it? At the least, we should mention it in the hickle documentation / README (IMO).
enhancement question

opened by telegraphic 1

Failing test with Python 3.11: AttributeError: property 'dtype' of 'Dataset' object has no setter

$ python3.11 -mpytest --no-cov
============================= test session starts ==============================
platform linux -- Python 3.11.0+, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/edward/src/2022/vendor/hickle-5.0.2, configfile: tox.ini
plugins: benchmark-3.2.2, astropy-header-0.2.2, forked-1.4.0, flaky-3.7.0, anyio-3.6.2, sugar-0.9.6, openfiles-0.5.0, hypothesis-6.36.0, arraydiff-0.5.0, doctestplus-0.12.1, kgb-7.1.1, repeat-0.9.1, django-4.5.2, timeout-2.1.0, astropy-0.10.0, pylama-7.4.3, cov-4.0.0, tornasync-0.6.0.post2, remotedata-0.3.3, filter-subpackage-0.1.1, mock-3.8.2, requests-mock-1.9.3, xdist-2.5.0, asyncio-0.19.0
asyncio: mode=Mode.STRICT
collected 102 items

hickle/tests/test_01_hickle_helpers.py ..F...                            [  5%]
hickle/tests/test_02_hickle_lookup.py .......................            [ 28%]
hickle/tests/test_03_load_builtins.py ......                             [ 34%]
hickle/tests/test_04_load_numpy.py ....                                  [ 38%]
hickle/tests/test_05_load_scipy.py ..                                    [ 40%]
hickle/tests/test_06_load_astropy.py .........                           [ 49%]
hickle/tests/test_07_load_pandas.py .                                    [ 50%]
hickle/tests/test_99_hickle_core.py ..........                           [ 59%]
hickle/tests/test_hickle.py .......................................      [ 98%]
hickle/tests/test_legacy_load.py ..                                      [100%]

=================================== FAILURES ===================================
____________________________ test_H5NodeFilterProxy ____________________________

h5_data = <HDF5 file "hickle_helpers_test_H5NodeFilterProxy.hdf5" (mode r)>

    def test_H5NodeFilterProxy(h5_data):
        """
        tests H5NodeFilterProxy class. This class allows to temporarily rewrite
        attributes of h5py.Group and h5py.Dataset nodes before being loaded by
        hickle._load method.
        """
    
        # load data and try to directly modify 'type' and 'base_type' Attributes
        # which will fail cause hdf5 file is opened for read only
        h5_node = h5_data['somedata']
        with pytest.raises(OSError):
            try:
                h5_node.attrs['type'] = pickle.dumps(list)
            except RuntimeError as re:
                raise OSError(re).with_traceback(re.__traceback__)
        with pytest.raises(OSError):
            try:
                h5_node.attrs['base_type'] = b'list'
            except RuntimeError as re:
                raise OSError(re).with_traceback(re.__traceback__)
    
        # verify that 'type' expands to tuple before running
        # the remaining tests
        object_type = pickle.loads(h5_node.attrs['type'])
        assert object_type is tuple
        assert object_type(h5_node[()].tolist()) == dummy_data
    
        # Wrap node by H5NodeFilterProxy and rerun the above tests
        # again. This time modifying Attributes shall be possible.
        h5_node = H5NodeFilterProxy(h5_node)
        h5_node.attrs['type'] = pickle.dumps(list)
        h5_node.attrs['base_type'] = b'list'
        object_type = pickle.loads(h5_node.attrs['type'])
        assert object_type is list
    
        # test proper pass through of item and attribute access
        # to wrapped h5py.Group or h5py.Dataset object respective
        assert object_type(h5_node[()].tolist()) == list(dummy_data)
        assert h5_node.shape == np.array(dummy_data).shape
        with pytest.raises(AttributeError,match = r"can't\s+set\s+attribute"):
>           h5_node.dtype = np.float32

/home/edward/src/2022/vendor/hickle-5.0.2/hickle/tests/test_01_hickle_helpers.py:154: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <hickle.helpers.H5NodeFilterProxy object at 0x7f4c808b2090>
name = 'dtype', value = <class 'numpy.float32'>

    def __setattr__(self, name, value):
        # if wrapped _h5_node and attrs shall be set store value on local attributes
        # otherwise pass on to wrapped _h5_node
        if name in {'_h5_node'}:
            super().__setattr__(name, value)
            return
        if name in {'attrs'}: # pragma: no cover
            raise AttributeError('attribute is read-only')
        _h5_node = super().__getattribute__('_h5_node')
>       setattr(_h5_node, name, value)
E       AttributeError: property 'dtype' of 'Dataset' object has no setter

/home/edward/src/2022/vendor/hickle-5.0.2/hickle/helpers.py:180: AttributeError

During handling of the above exception, another exception occurred:

h5_data = <HDF5 file "hickle_helpers_test_H5NodeFilterProxy.hdf5" (mode r)>

    def test_H5NodeFilterProxy(h5_data):
        """
        tests H5NodeFilterProxy class. This class allows to temporarily rewrite
        attributes of h5py.Group and h5py.Dataset nodes before being loaded by
        hickle._load method.
        """
    
        # load data and try to directly modify 'type' and 'base_type' Attributes
        # which will fail cause hdf5 file is opened for read only
        h5_node = h5_data['somedata']
        with pytest.raises(OSError):
            try:
                h5_node.attrs['type'] = pickle.dumps(list)
            except RuntimeError as re:
                raise OSError(re).with_traceback(re.__traceback__)
        with pytest.raises(OSError):
            try:
                h5_node.attrs['base_type'] = b'list'
            except RuntimeError as re:
                raise OSError(re).with_traceback(re.__traceback__)
    
        # verify that 'type' expands to tuple before running
        # the remaining tests
        object_type = pickle.loads(h5_node.attrs['type'])
        assert object_type is tuple
        assert object_type(h5_node[()].tolist()) == dummy_data
    
        # Wrap node by H5NodeFilterProxy and rerun the above tests
        # again. This time modifying Attributes shall be possible.
        h5_node = H5NodeFilterProxy(h5_node)
        h5_node.attrs['type'] = pickle.dumps(list)
        h5_node.attrs['base_type'] = b'list'
        object_type = pickle.loads(h5_node.attrs['type'])
        assert object_type is list
    
        # test proper pass through of item and attribute access
        # to wrapped h5py.Group or h5py.Dataset object respective
        assert object_type(h5_node[()].tolist()) == list(dummy_data)
        assert h5_node.shape == np.array(dummy_data).shape
>       with pytest.raises(AttributeError,match = r"can't\s+set\s+attribute"):
E       AssertionError: Regex pattern "can't\\s+set\\s+attribute" does not match "property 'dtype' of 'Dataset' object has no setter".

/home/edward/src/2022/vendor/hickle-5.0.2/hickle/tests/test_01_hickle_helpers.py:153: AssertionError
=============================== warnings summary ===============================
hickle/tests/test_06_load_astropy.py::test_create_astropy_constant
  /usr/lib/python3/dist-packages/astropy/constants/constant.py:99: AstropyUserWarning: Constant 'Gravitational constant' already has a definition in the None system from 'CODATA 2018' reference
    warnings.warn('Constant {!r} already has a definition in the '

hickle/tests/test_06_load_astropy.py::test_create_astropy_constant
  /usr/lib/python3/dist-packages/astropy/constants/constant.py:99: AstropyUserWarning: Constant 'Electron charge' already has a definition in the 'emu' system from 'CODATA 2018' reference
    warnings.warn('Constant {!r} already has a definition in the '

hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/tests/test_06_load_astropy.py:169: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    assert reloaded.value[index].tostring() == t1.value[index].tostring()

hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
hickle/tests/test_06_load_astropy.py::test_astropy_time_array
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/tests/test_06_load_astropy.py:177: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    assert reloaded.value[index].tostring() == t1.value[index].tostring()

hickle/tests/test_hickle.py::test_scalar_compression
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/tests/test_hickle.py:745: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    data = {'a': 0, 'b': np.float(2), 'c': True}

hickle/tests/test_hickle.py::test_slash_dict_keys
  /usr/lib/python3/dist-packages/pytest_tornasync/plugin.py:45: PytestRemovedIn8Warning: Passing None has been deprecated.
  See https://docs.pytest.org/en/latest/how-to/capture-warnings.html#additional-use-cases-of-warnings-in-tests for alternatives in common use cases.
    pyfuncitem.obj(**testargs)

hickle/tests/test_legacy_load.py::test_4_0_0_load
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/loaders/load_scipy.py:91: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.
    self.object_type = pickle.loads(item.attrs['type'])

hickle/tests/test_legacy_load.py::test_4_0_0_load
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/loaders/load_scipy.py:91: DeprecationWarning: Please use `csc_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csc` namespace is deprecated.
    self.object_type = pickle.loads(item.attrs['type'])

hickle/tests/test_legacy_load.py::test_4_0_0_load
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/loaders/load_scipy.py:91: DeprecationWarning: Please use `bsr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.bsr` namespace is deprecated.
    self.object_type = pickle.loads(item.attrs['type'])

hickle/tests/test_legacy_load.py::test_4_0_0_load
  /home/edward/src/2022/vendor/hickle-5.0.2/hickle/lookup.py:1611: MockedLambdaWarning: presenting '<function _moc_numpy_array_object_lambda at 0x7f4c796220c0>' instead of stored lambda 'type'
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED hickle/tests/test_01_hickle_helpers.py::test_H5NodeFilterProxy - Asser...
================== 1 failed, 101 passed, 24 warnings in 3.88s ==================

pull-request-welcome help-wanted

opened by EdwardBetts 1

Failing test on big-endian: TypeError: No conversion path for dtype: dtype('>U23')

When I run test tests on a machine with the s390x architecture the test_astropy_time_array fails with this exception:

TypeError: No conversion path for dtype: dtype('>U23')

This looks like an error caused by the s390x architecture being big-endian.

Here is the full output of the failing test.

$ python3 -mpytest --verbose -k test_astropy_time_array --no-cov
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.0.0+repack -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/edward/hickle/hickle/.hypothesis/examples')
rootdir: /home/edward/hickle/hickle, configfile: tox.ini
plugins: doctestplus-0.12.0, arraydiff-0.5.0, openfiles-0.5.0, cov-3.0.0, mock-3.8.2, hypothesis-6.36.0, filter-subpackage-0.1.1, remotedata-0.3.3, astropy-header-0.2.1, astropy-0.10.0
collected 102 items / 101 deselected / 1 selected                                                                                                                                                                   

hickle/tests/test_06_load_astropy.py::test_astropy_time_array FAILED                                                                                                                                          [100%]

===================================================================================================== FAILURES ======================================================================================================
______________________________________________________________________________________________ test_astropy_time_array ______________________________________________________________________________________________

h5_data = <HDF5 group "/root_group" (2 members)>, compression_kwargs = {}

    def test_astropy_time_array(h5_data,compression_kwargs):
        """
        test proper storage and loading of astropy time representations
        """
    
        loop_counter = 0
    
    
        for times in ([58264, 58265, 58266], [[58264, 58265, 58266], [58264, 58265, 58266]]):
            t1 = Time(times, format='mjd', scale='utc')
    
            h_dataset, subitems = load_astropy.create_astropy_time(t1,h5_data, f'time_{loop_counter}',**compression_kwargs)
            assert isinstance(h_dataset,h5.Dataset) and not subitems and iter(subitems)
            assert h_dataset.attrs['format'] in( str(t1.format).encode('ascii'),str(t1.format))
            assert h_dataset.attrs['scale'] in ( str(t1.scale).encode('ascii'),str(t1.scale))
            assert h_dataset.attrs['np_dtype'] in( t1.value.dtype.str.encode('ascii'),t1.value.dtype.str)
            reloaded = load_astropy.load_astropy_time_dataset(h_dataset,b'astropy_time',t1.__class__)
            assert reloaded.value.shape == t1.value.shape
            assert reloaded.format == t1.format
            assert reloaded.scale == t1.scale
            for index in range(len(t1)):
                assert np.allclose(reloaded.value[index], t1.value[index])
            loop_counter += 1
    
        t_strings = ['1999-01-01T00:00:00.123456789', '2010-01-01T00:00:00']
    
        # Check that 2D time arrays work as well (github issue #162)
        for times in (t_strings, [t_strings, t_strings]):
            t1 = Time(times, format='isot', scale='utc')
    
>           h_dataset,subitems = load_astropy.create_astropy_time(t1,h5_data,f'time_{loop_counter}',**compression_kwargs)

/home/edward/hickle/hickle/hickle/tests/test_06_load_astropy.py:159: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/edward/hickle/hickle/hickle/loaders/load_astropy.py:134: in create_astropy_time
    d = h_group.create_dataset(
/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/group.py:161: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/dataset.py:88: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/_debian_h5py_serial/h5t.pyx:1663: in h5py._debian_h5py_serial.h5t.py_create
    ???
h5py/_debian_h5py_serial/h5t.pyx:1687: in h5py._debian_h5py_serial.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: No conversion path for dtype: dtype('>U23')

h5py/_debian_h5py_serial/h5t.pyx:1753: TypeError
============================================================================================== short test summary info ==============================================================================================
FAILED hickle/tests/test_06_load_astropy.py::test_astropy_time_array - TypeError: No conversion path for dtype: dtype('>U23')
========================================================================================= 1 failed, 101 deselected in 0.86s =========================================================================================
$

pull-request-welcome help-wanted

opened by EdwardBetts 2

Hickle now in Debian

Thanks to @EdwardBetts, hickle is now included in Debian (unstable) 🙏.

Mainly opening this issue to let @hernot, @1313e, and other devs know 🚀. I have just fixed a few outstanding issues and will bump to v5.0.2.

opened by telegraphic 1

Releases(v5.0.2)

v5.0.2(Aug 31, 2022)
Minor release, two small bugfixes:

Fix for 2D astropy Time objects from strings #162

Fix for propagating filename arg #155

Source code(tar.gz)
Source code(zip)
v5.0.1(Aug 30, 2022)

Minor release without dill dependency (for PyPi)
Source code(tar.gz)
Source code(zip)
v5.0.0(Dec 17, 2021)
Support for newer versions of numpy >= 1.21 and h5py >= 3.0

Improved internal HDF5 structure for python dictionaries (no longer trailing /data)

Deprecated use of dill in favor of in-built pickle (given the updates to pickle functionality in Py3, and issues with numpy dtypes)

Switched to github actions for CI

Objects referred to multiple times are now only dumped once within the HDF5 file (HEP002)

Source code(tar.gz)
Source code(zip)
v4.0.3(Dec 17, 2020)

This is a minor patch release for v4 that fixes backward compatibility issues with v3 files.
Source code(tar.gz)
Source code(zip)
v4.0.1(Jul 28, 2020)

This is a minor patch release for v4 that fixes a problem with hickling AstroPy SkyCoord objects.
Source code(tar.gz)
Source code(zip)
v3.4.8(Jul 28, 2020)

This is a minor patch release for v3 that fixes a problem with hickling AstroPy SkyCoord objects.
Source code(tar.gz)
Source code(zip)
v4.0.0(Jun 25, 2020)
This is the major v4.0.0 release of the hickle package.

Changes:

Dropped support for Python 2.7;

Dropped legacy support for hickle files made with v1 and v2;

OrderedDict is now supported (#65);

Subclasses of supported classes can now be properly dumped;

data_0 is no longer used if there is only a single data group/set (#44);

HDF5 groups can now be dumped to and loaded from (#54);

Integers using Python's arbitrary-precision (integers larger than 64-bit) can now be dumped and loaded properly (#113);

Replaced broken link to pickle documentation with proper one (#122);

Objects that appear to be iterable are no longer considered as such unless hickle knows for sure they are iterables (#70 and #125);

Dict keys with slashes are now supported (#124);

Loaders are only loaded when they are required for dumping or loading specific objects (#114);

hickle now has 100% test coverage;

NumPy arrays containing unicode strings can be properly dumped and loaded;

NumPy arrays containing non-NumPy objects can be dealt with as well (#90);

Removed the use of 'track_times' (#130);

If an object fails to be hickled using normal means, hickle will now fall back to pickling the object;

Massively simplified the way in which builtin Python scalars are stored, making it easier for the user to view.

Source code(tar.gz)
Source code(zip)
3.4.6(Mar 12, 2020)

Fix for not being able to dump and load certain serialized objects.
Source code(tar.gz)
Source code(zip)

Owner

Danny Price

GitHub Repository http://telegraphic.github.io/hickle/

A Python dictionary implementation designed to act as an in-memory cache for FaaS environments

faas-cache-dict A Python dictionary implementation designed to act as an in-memory cache for FaaS environments. Formally you would describe this a mem

3 Dec 13, 2022

This repository contains code for CTF platform.

CTF-platform Repository for backend of CTF hosting website For starting the project first time : Clone the repo in which you have to work in your syst

3 Feb 18, 2022

This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will

1.3k Jan 01, 2023

dict subclass with keylist/keypath support, normalized I/O operations (base64, csv, ini, json, pickle, plist, query-string, toml, xml, yaml) and many utilities.

python-benedict python-benedict is a dict subclass with keylist/keypath support, I/O shortcuts (base64, csv, ini, json, pickle, plist, query-string, t

799 Jan 09, 2023

🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

py-struct Fixed-size struct serialization, using Python 3.9 annotated type hints This was originally uploaded as a Gist because it's not intended as a

4 Jan 14, 2022

Decided to include my solutions for leetcode problems.

LeetCode_Solutions Decided to include my solutions for leetcode problems. LeetCode # 1 TwoSum First leetcode problem and it was kind of a struggle. Th

0 Jan 01, 2022

IADS 2021-22 Algorithm and Data structure collection

A collection of algorithms and datastructures introduced during UoE's Introduction to Datastructures and Algorithms class.

20 Nov 07, 2022

A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.

93 Dec 01, 2022

A HDF5-based python pickle replacement

Related tags

Overview

Hickle

Why use Hickle?

Documentation

Usage example

HDF5 compression options

Recent changes

Performance comparison

Installation guidelines

Easy method

Manual install

Testing

Bugs & contributing

Referencing hickle

Comments

At first:

Second: the reason why I'm so pushing upon implementation of H4EP001

Third content of this pull-request:

Finally i do recommend:

Releases(v5.0.2)

v5.0.2(Aug 31, 2022)

v5.0.1(Aug 30, 2022)

v5.0.0(Dec 17, 2021)

v4.0.3(Dec 17, 2020)

v4.0.1(Jul 28, 2020)

v3.4.8(Jul 28, 2020)

v4.0.0(Jun 25, 2020)

3.4.6(Mar 12, 2020)

Owner

Danny Price

A Python dictionary implementation designed to act as an in-memory cache for FaaS environments

This repository contains code for CTF platform.

This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

dict subclass with keylist/keypath support, normalized I/O operations (base64, csv, ini, json, pickle, plist, query-string, toml, xml, yaml) and many utilities.

🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

Decided to include my solutions for leetcode problems.

IADS 2021-22 Algorithm and Data structure collection

A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.

Array is a functional mutable sequence inheriting from Python's built-in list.

One-Stop Destination for codes of all Data Structures & Algorithms

Common sorting algorithims in Python

CLASSIX is a fast and explainable clustering algorithm based on sorting

A Python library for electronic structure pre/post-processing

An command-line utility that schedules your exams preparation routines

Simple spill-to-disk dictionary

RLStructures is a library to facilitate the implementation of new reinforcement learning algorithms.

A Munch is a Python dictionary that provides attribute-style access (a la JavaScript objects).

Multidict is dict-like collection of key-value pairs where key might be occurred more than once in the container.

nocasedict - A case-insensitive ordered dictionary for Python

Chemical Structure Generator