Extended pickling support for Python objects

Overview

cloudpickle

Automated Tests codecov.io

cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.

Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the __main__ module (for instance in a script, a shell or a Jupyter notebook).

Cloudpickle can only be used to send objects between the exact same version of Python.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Security notice: one should only load pickle data from trusted sources as otherwise pickle.load can lead to arbitrary code execution resulting in a critical security vulnerability.

Installation

The latest release of cloudpickle is available from pypi:

pip install cloudpickle

Examples

Pickling a lambda expression:

>>> import cloudpickle
>>> squared = lambda x: x ** 2
>>> pickled_lambda = cloudpickle.dumps(squared)

>>> import pickle
>>> new_squared = pickle.loads(pickled_lambda)
>>> new_squared(2)
4

Pickling a function interactively defined in a Python shell session (in the __main__ module):

>>> CONSTANT = 42
>>> def my_function(data: int) -> int:
...     return data + CONSTANT
...
>>> pickled_function = cloudpickle.dumps(my_function)
>>> depickled_function = pickle.loads(pickled_function)
>>> depickled_function
<function __main__.my_function(data:int) -> int>
>>> depickled_function(43)
85

Running the tests

  • With tox, to test run the tests for all the supported versions of Python and PyPy:

    pip install tox
    tox
    

    or alternatively for a specific environment:

    tox -e py37
    
  • With py.test to only run the tests for your current version of Python:

    pip install -r dev-requirements.txt
    PYTHONPATH='.:tests' py.test
    

History

cloudpickle was initially developed by picloud.com and shipped as part of the client SDK.

A copy of cloudpickle.py was included as part of PySpark, the Python interface to Apache Spark. Davies Liu, Josh Rosen, Thom Neale and other Apache Spark developers improved it significantly, most notably to add support for PyPy and Python 3.

The aim of the cloudpickle project is to make that work available to a wider audience outside of the Spark ecosystem and to make it easier to improve it further notably with the help of a dedicated non-regression test suite.

Issues
  • Add ability to register modules to be deeply serialized

    Add ability to register modules to be deeply serialized

    This PR is based on the work done by @kinghuang in PR391, but takes on the feedback provided by @ogrisel and adds testing.

    Fixes #206

    Issue Summary

    To summarise the issue, in many cases cloudpickle is used to send code for remote execution. This is used in dask, prefect, mlflow and many libraries. For local functions, this works perfectly fine. But for any non-local function or class, cloudpickle assumes that external modules and packages are available at the location of deserialization. This may either not be the case, or the version of the package available at the end point may be different.

    This PR adds the option to register modules for deep serialization by providing a register_deep_serialization function which takes either a name or a module. This is the original register_dynamic_module by @kinghuang.

    import cloudpickle
    from tests import external
    
    cloudpickle.register_deep_serialization("tests.external")  # string name works
    cloudpickle.register_deep_serialization(external)          # You can pass the module itself
    cloudpickle.register_deep_serialization("tests")           # or the parent string/module
    
    output = cloudpickle.dumps(external.an_external_function)
    

    Original dumps:

    b'\x80\x05\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x0etests.external\x94\x8c\x14an_external_function\x94\x93\x94.'
    

    dumps after registering tests.external for deep serialization:

    b'\x80\x05\x95<\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(h\x02\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94N\x8c\x11this is something\x94\x86\x94))\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94\x8c\x14an_external_function\x94K\x04C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x05tests\x94\x8c\x08__name__\x94\x8c\x0etests.external\x94\x8c\x08__file__\x94\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94uNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x19}\x94}\x94(h\x14h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x15\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0.'
    

    Modules can be unregistered via unregister_deep_serialization

    Tests

    One of the example tests with an explicit use case is shown above. On top of this, tests have been added to _lookup_module_and_qualname using the _cloudpickle_testpkg package, and also to the new _is_explicitly_serialized_module function.

    opened by Samreay 58
  • ENH: derive from C-pickler for fast serialization

    ENH: derive from C-pickler for fast serialization

    Summary:

    This PR proposes a new Cloudpickler class, that inherits from the C _pickle.Pickler instead of the python pickle._Pickler, allowing 10x+ speedups for the serialization of large builtin objects such as dicts, lists..

    Disclaimer: a new start

    Moving from the python to the c Pickler requires a fair amount of changes. For this reason, instead of simply adapting the current code to respect the new constraints, I started back from scratch. This allows a new, clean API and structure, that will be hopefully easier to understand for everyone.

    I made a lot of comments, (sometimes overly verbose), to ease the review process of this PR. Eventually, I hope the information they contain can be transfered to a proper project documentation.

    Implementation:

    Changes to python

    As opposed to the python pickler, The CPickler does not expose the save_* family of functions, as well as low level isntructions such as write. These methods can can neither be patched, or called, and the only customization option we had initially was the dispatch table, that is called for all types BUT a few special cases, including classes and functions, the two principal use-cases of cloudpickle.

    As this makes it simply impossible to modify pickling behavior for such types, we patched the C pickler for it to allow a user defined reduction callback for functions and classes. This idea was suggested by @pitrou.

    The direct consequence is that functions and classes now have to follow the save_reduce-load_build pickling/depickling process. Unfortunaltely, this API is not well suited for custom builtin-type saving: in particular, the state setting part of load_build (function that reconstructs an object from a reduce value) assumes all attributes of an object are writeable, which is not the case for C types (especially function.__globals__ and function.__closure__)

    For this reason, we also changed the API of save_reduce, allowing to add a custom state_setter, that will be called at unpickling time.

    You can view the totals changes in this diff

    Individual PRs to CPython:

    • https://github.com/python/cpython/pull/12499 (reducer_override)
    • https://github.com/python/cpython/pull/12588 (state_setter in save_reduce)

    Changes to cloudpickle

    Functions and classes are the two main types affected by this PR. The main challenge was to make the saving process fit into the save_reduce API.

    Outside of these types, the actuall reduction process remains intact.

    However, now that any customization must return a tuple, I decided to adopt a new naming, hopefully clearer naming style for functions. You will see by yourselves.

    How to build this version locally

    Until the final release of Python 3.8, you need to build python from upstream's master branch

    git clone [email protected]:python/cpython.git
    cd cpython
    ./configure
    make
    

    To be able to use external modules you need a virtual environment, using for example the venv module:

    ./python -m venv /path/to/local/virtualenv
    

    Clone and install cloudpickle and its dependencies

    cd /path/to/cloudpickle
    git clone [email protected]:cloudpipe/cloudpickle.git
    git fetch origin pull/ID/head:fast-cloudpickle
    /path/to/local/virtualenv/bin/python -mpip install -rdev-requirements.txt
    /path/to/local/virtualenv/bin/python -mpip install .
    

    Finally, rum the tests:

    /path/to/local/virtualenv/bin/python -mpytest tests/
    

    Bechmarks:

    • Benchmarks of a "concrete", end-to-end use-case using loky can be found here. To run the benchmarks, you also need the master version of loky.
    opened by pierreglaser 41
  • Pickling of generic annotations/types in 3.5+

    Pickling of generic annotations/types in 3.5+

    This PR adds support for pickling annotations on 3.5+, and fixes some problems with generic annotations on 3.7+.

    TODO

    • [x] Backport for 3.5
    • [x] Test that fails with TypeError: type() doesn't support MRO entry resolution; use types.new_class() if not using types.new_class for reconstructing classes
    • [x] Remove typing_extensions dependency
    • [x] Prefix privates with _
    • [x] Add test for pickle_depickle'ing annotated functions/classes

    Details

    The types.new_class change (in _make_skeleton_class) is because of a TypeError: type() doesn't support MRO entry resolution; use types.new_class() error on 3.7+, similar to this issue. Also see https://github.com/python/cpython/pull/6319.

    I'm not sure if there are any downsides to TypeVars being __reduce__'d now. Previously, they were only supported as globals (so always imported, I think).

    The functions try_decompose_generic and get_bases are brittle the way they are written, because they check for attributes. There might be a better way.

    Tests

    Passing, and added some new ones.

    ci downstream ci ray ci joblib ci distributed ci python-nightly ci loky 
    opened by valtron 39
  • deduplicate cloudpickle reducers.

    deduplicate cloudpickle reducers.

    closes #284 related to #364

    About backward compatiblity:

    • this PR removes make_skel_func, fill_function, e.g the previous function cloudpickle used to reconstruct functions, as they were equivalent to the new function_setstate/function_new (modulo some Python 3.8 compatiblity. These functions are important to reconstruct pickles created by previous cloudpickle versions. A simple fix is to keep them inside cloudpickle.py for a few releases and add a FutureWarning inside them saying that an attempt is made into reading old pickle, and that reading them will break in 2 releases.
    • By removing the previous Python < 3.8 CloudPickler class this PR also removes semi-public functions (all the CloudPickler.save_*). These functions are not necessary to read old pickles, but they could be used inside third-party code. To address this, we could keep exposing the previous CloudPickler for the next few releases, but make cloudpickle.dump(s) use the new CloudPickler. This way, we can add a FutureWarning into the previous CloudPickler.__init__, while cloudpickle.dump(s) remains silent.

    Also, the module names don't make much sense now. In the future we should rename cloudpickle_fast.py to cloudpickle.py, and merge it with the previous cloudpickle.py.

    @jakirkham if you want to give #364 another shot, but rebasing on this PR first, I suspect its implementation should be much easier :)

    ci downstream 
    opened by pierreglaser 25
  • Add ability to pickle dynamically create modules

    Add ability to pickle dynamically create modules

    The old logic treated all modules the same, which would fail when unpickling. In save_module detect whether the module has been dynamically created by following the chain of imports. Noteworthy is that imp.find_module doesn't work with submodules (example sckit.tree), so we actually have to split the module name and iterate over each piece.

    Dynamic modules are saved as dictionaries and reconstituted by dynamic_subimport function. While working on the test cases I discovered NotImplemented and Ellipsis also don't work properly (they are introduced into the test dynamic module by exec). I've also addressed that.

    opened by rodrigofarnhamsc 21
  • Optionally use pickle5 (Redux)

    Optionally use pickle5 (Redux)

    Fixes https://github.com/cloudpipe/cloudpickle/issues/179

    Thanks to @pierreglaser's work in PR ( https://github.com/cloudpipe/cloudpickle/pull/368 ), this is a rebased/simplified version of PR ( https://github.com/cloudpipe/cloudpickle/pull/364 ). Otherwise is the same in that it tries to use pickle5 on older Python versions to support out-of-band buffers.

    ci downstream 
    opened by jakirkham 20
  • Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    This is a fix for #244 (and #101) to add support for dynamically defined Enum subclasses.

    Properly adding support for dynamic Enums required to more broadly fix the isinstance semantics as initially requested in #195.

    The proposed solution involves tracking the provenance of pickled dynamic class definitions with a pair of weakref.WeakKeyDictionary / weakref.WeakValueDictionary protected by a threading.Lock.

    enhancement 
    opened by ogrisel 19
  • Making cloudpickle produce

    Making cloudpickle produce "consistent/deterministic" results.

    This question arose in the following context. I have multiple Python processes, and some classes are defined in each process. Sometimes class definitions are shipped from one process to another (using cloudpickle). Sometimes classes may be shipped multiple times or in multiple ways to a given process, and I'd like to deduplicate them based on the output of cloudpickle (that is, if cloudpickle.dumps(class1) == cloudpickle.dumps(class2), then the classes are the "same" and I can throw away one of them. This works, but there are way too many false negatives (that is, two classes really are the same (in some sense), but cloudpickle.dumps gives different results on the two classes.

    Here's one example that sort of illustrates the issue (although there are a number of ways this kind of thing can arise).

    Suppose I do the following.

    import cloudpickle
    
    class Foo1(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo1)
    Foo2 = cloudpickle.loads(serialized1)
    serialized2 = cloudpickle.dumps(Foo2)
    
    assert serialized1 == serialized2  # This assertion fails.
    

    I'd love for this kind of assertion to succeed. Does anyone know if this is achievable or what the main obstacles are?

    Interestingly, if I iterate this a third time,

    Foo3 = cloudpickle.loads(serialized2)
    serialized3 = cloudpickle.dumps(Foo3)
    
    assert serialized2 == serialized3  # This succeeds.
    

    then the assert succeeds, so maybe it suffices to use cloudpickle.dumps(cloudpickle.loads(cloudpickle.dumps(cls))) to deduplicate classes (though this seems kind of insane, and I haven't tested this extensively). Would you expect this to work?

    One thing that may be related/revealing is the outputs I get if I do something similar in an IPython interpreter (instead of a regular Python interpreter).

    First copy and paste this block into IPython.

    import cloudpickle
    
    class Foo(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo)
    

    Then copy and paste this block into IPython.

    class Foo(object):
        def __init__(self):
            pass
    
    serialized2 = cloudpickle.dumps(Foo)
    

    Comparing serialized1 and serialized2 next to each other, they are

    serialized1  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    serialized2  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    

    They seem to be the same everywhere except that the first includes the string <ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h and the second includes the string <ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h. Any idea where these strings come from or if it is possible to remove them?

    cc @Wapaul1 @mehrdadn

    opened by robertnishihara 19
  • NumPy arrays serialize more slowly with cloudpickle than pickle

    NumPy arrays serialize more slowly with cloudpickle than pickle

    I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

    In [1]: import numpy as np
    
    In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)
    
    In [3]: import cloudpickle, pickle
    
    In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
    Wall time: 185 ms
    Out[4]: 100000161
    
    In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 125 ms, sys: 280 ms, total: 404 ms
    Wall time: 405 ms
    Out[5]: 100000161
    
    opened by mrocklin 19
  • Remove non-standard __transient__ support

    Remove non-standard __transient__ support

    The __transient__ dunder attribute is not the standard way to prevent attributes from being pickled. Instead, the standard approach is to use the __getstate__ and __setstate__ magic methods.

    Considering that:

    • This is an old fix that was implemented maybe for unsupported Python versions (i.e.: Python 2.6).
    • Nobody knows what it is doing there exactly or why it was added.
    • Having this special non-standard case implemented makes the code more complex and may result in unexpected behavior (see #108).
    • This code was not even covered by tests, so removing it should increase code coverage and make the module more robust.

    I propose to remove any support for the non-standard approach.

    As mentioned in #108, some other projects might be using this attribute. But when looking at those projects:

    • Most simply have copies of the cloudpickle.py file (hence the match when searching for __transient__).
    • Others simply seem to be using __transient__ but without depending on cloudpickle as an external module dependency.
    • Most seem to be not very relevant (i.e.: fewer than 2 stars).

    I think even though this change may break other's code it is an unlikely scenario. Anyway, if that was the case, I think they should be fixing their code rather than making cloudpickle carry ugly fixes. Also, they can always choose to use an older cloudpickle version from PyPi.

    Fixes #108.

    opened by Peque 18
  • Fix cloudpickle incompatibilities on early Python 3.5 versions

    Fix cloudpickle incompatibilities on early Python 3.5 versions

    Closes #360 . cloudpickle 1.4.0 is not compatible with early Python 3.5 versions. This should fix it.

    Note that I did not set up any CI for Python 3.5.[0-2], I simply tested it on my machine using fresh conda envs.

    @vedran If you have some time, could you tell me if this branch fixes the problems that made you create #360?

    I would be tempted to release a bugfix version by tonight since this bug completely breaks cloudpickle on Python 3.5.

    opened by pierreglaser 17
  • Incorrect deserialization of subclasses, module changed to `types`

    Incorrect deserialization of subclasses, module changed to `types`

    This issue is similar to #440 but I have verified it still happens after the fix (on latest master).

    Somehow the deserialized subclass has __module__ of types instead of __main__. This also happen when the classes are moved to their separate files.

    See the following repro script:

    import cloudpickle
    import multiprocessing as mp
    
    print(cloudpickle.__version__)
    
    class Parent:
        pass
    
    class Child(Parent):
        pass
    
    def get_mro(klass):
        return [f"{base.__module__}.{base.__qualname__}" for base in klass.mro()]
    
    def task(b: bytes):
        cls = cloudpickle.loads(b)
        return str(cls), get_mro(cls)
    
    for klass in [Parent, Child]:
        with mp.Pool() as pool:
            cls_name, mros = pool.apply(task, (cloudpickle.dumps(klass),))
        print()
        print("local class name", str(klass))
        print("deserialized class name", cls_name)
        print()
        print("local mro", get_mro(klass))
        print("deserialized mro", mros)
    

    My output on Python 3.7 f758eb34d1b3285dc582b73bfd8df4c47ed4fc68

    2.1.0.dev0
    
    local class name <class '__main__.Parent'>
    deserialized class name <class '__main__.Parent'>
    
    local mro ['__main__.Parent', 'builtins.object']
    deserialized mro ['__main__.Parent', 'builtins.object']
    
    local class name <class '__main__.Child'>
    deserialized class name <class 'types.Child'>
    
    local mro ['__main__.Child', '__main__.Parent', 'builtins.object']
    deserialized mro ['types.Child', '__main__.Parent', 'builtins.object']
    
    opened by simon-mo 1
  • `__init__` and `__call__` methods are not cloudpickled

    `__init__` and `__call__` methods are not cloudpickled

    Overwriting the logic inside __call__ method of the wrapper:

    import cloudpickle
    import wrapt
    import pandas as pd
    
    def _num_of_rows_wrapper() -> Any:
        class FunctionWrapper(wrapt.FunctionWrapper):
            def __call__(self, *args: Any, **kwargs: Any) -> Any:
                dataframe = super(FunctionWrapper, self).__call__(*args, **kwargs)
                num_of_rows = len(dataframe.index)
                print(f"Num of rows: {num_of_rows}")
    
                return dataframe
    
        return FunctionWrapper
    
    def num_of_rows():
        @wrapt.decorator(proxy=_num_of_rows_wrapper())
        def wrapper(wrapped, instance, args, kwargs) -> None:
            return wrapped(*args, **kwargs)
    
        return wrapper    
    
    @num_of_rows()
    def prepare_data():
        data = [["id1", 10], ["id2", 15], ["id3", 14]]
        pandas_df = pd.DataFrame(data, columns=["id", "value"])
        return pandas_df
    

    Running the decorated function works as expected (num of rows printed):

    prepare_data()
    

    Once it is cloudpickled and loaded, __call__ gets lost (num of rows is not printed)

    dump = cloudpickle.dumps(prepare_data)
    loaded = cloudpickle.loads(dump)
    loaded()
    

    Same happens for the __init__ . Any suggestion for keeping those functions with cloudpickle?

    opened by kocabiyikalper 1
  • Pickling error for `typing.NamedTuple` on Python 3.9

    Pickling error for `typing.NamedTuple` on Python 3.9

    The current cloudpickle main branch fails to roundtrip a typing.NamedTuple object when run on Python 3.9. Here's an example snippet

    import cloudpickle
    from typing import NamedTuple
    
    class Record(NamedTuple):
        value: int
    
    result = cloudpickle.dumps(Record(10))
    cloudpickle.loads(result)
    

    which fails with the following traceback

    Traceback (most recent call last):
      File "/Users/james/projects/cloudpipe/cloudpickle/test.py", line 8, in <module>
        cloudpickle.loads(result)
      File "/Users/james/projects/cloudpipe/cloudpickle/cloudpickle/cloudpickle.py", line 823, in _make_skeleton_class
        skeleton_class = types.new_class(
      File "/Users/james/mambaforge/envs/cloudpickle/lib/python3.9/types.py", line 77, in new_class
        return meta(name, resolved_bases, ns, **kwds)
      File "/Users/james/mambaforge/envs/cloudpickle/lib/python3.9/typing.py", line 1852, in __new__
        module=ns['__module__'])
    KeyError: '__module__'
    

    Interestingly things seem to work fine on Python 3.7 and 3.8. Additionally, I should note that the above snippet also works on Python 3.9 when using the built-in pickle module instead of cloudpickle.

    This was originally surfaced over in https://github.com/dask/distributed/issues/5602

    opened by jrbourbeau 2
  • Making

    Making "squash merging" default

    After merging a PR recently, noticed other PRs here have been merged with "squash merging". Should we make this the default and/or only merge method?

    No strong feelings personally. Just wondering how we can make it easier to do the expected merge behavior.

    opened by jakirkham 5
  • Cannot pickle Mock or MagicMock

    Cannot pickle Mock or MagicMock

    Scenario

    It happens that in tests somewhere deep in the object hierarchies and structures something has to be mocked. This especially happens with external services and their clients.

    Problem

    If cloudpickle happens to stumble upon a mock, it fails and it's very difficult to track down which object and which attribute contains the mocked object (especially when there are a bunch of mocks).

    Example

    requirements.txt cloudpickle==2.0.0

    test_pickle.py

    import unittest
    from unittest.mock import MagicMock
    
    import cloudpickle
    
    
    class ToMock:
        a: str
    
        def __init__(self):
            self.something = "something"
    
    
    class Mockable:
    
        def __init__(self, to_mock=None):
            self.not_mocked = "this is not mocked"
            self.to_be_mocked = to_mock or ToMock()
    
    
    class TestPickling(unittest.TestCase):
    
        def test_no_mocking(self):
            self.assertIsInstance(cloudpickle.dumps(Mockable()), bytes)
    
        def test_simple_mock(self):
            self.assertIsInstance(cloudpickle.dumps(Mockable(MagicMock())), bytes)
    
        def test_specced_mock(self):
            self.assertIsInstance(cloudpickle.dumps(Mockable(MagicMock(spec=ToMock))), bytes)
    
        def test_autospecced_mock(self):
            self.assertIsInstance(cloudpickle.dumps(Mockable(unittest.mock.create_autospec(ToMock))), bytes)
    
    
    if __name__ == '__main__':
        unittest.main()
    

    output

    python test_pickle.py
    Testing started at 19:05 ...
    Launching unittests with arguments python -m unittest test_pickle.TestPickling in /home/bumbum/PycharmProjects/pythonProject
    
    
    Error
    Traceback (most recent call last):
      File "/home/bumbum/PycharmProjects/pythonProject/test_pickle.py", line 33, in test_autospecced_mock
        self.assertIsInstance(cloudpickle.dumps(Mockable(unittest.mock.create_autospec(ToMock))), bytes)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
        cp.dump(obj)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
        return Pickler.dump(self, obj)
    _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
    
    
    Error
    Traceback (most recent call last):
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
        return Pickler.dump(self, obj)
      File "/usr/lib64/python3.9/unittest/mock.py", line 2500, in __call__
        return _Call((self._mock_name, args, kwargs), name=name, parent=self)
      File "/usr/lib64/python3.9/unittest/mock.py", line 2404, in __new__
        _len = len(value)
    RecursionError: maximum recursion depth exceeded while calling a Python object
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/bumbum/PycharmProjects/pythonProject/test_pickle.py", line 27, in test_simple_mock
        self.assertIsInstance(cloudpickle.dumps(Mockable(MagicMock())), bytes)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
        cp.dump(obj)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 609, in dump
        raise pickle.PicklingError(msg) from e
    _pickle.PicklingError: Could not pickle object as excessively deep recursion required.
    
    
    Error
    Traceback (most recent call last):
      File "/home/bumbum/PycharmProjects/pythonProject/test_pickle.py", line 30, in test_specced_mock
        self.assertIsInstance(cloudpickle.dumps(Mockable(MagicMock(spec=ToMock))), bytes)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
        cp.dump(obj)
      File "/home/python/cloudpickle-magic-mock/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
        return Pickler.dump(self, obj)
    _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
    
    
    
    
    
    
    
    Ran 4 tests in 0.043s
    
    FAILED (errors=3)
    
    opened by LoveIsGrief 0
Releases(v2.0.0)
  • v0.5.3(May 14, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.5.2

    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types (issue #144).

    • itertools objects can also pickled (PR #156).

    • logging.RootLogger can be also pickled (PR #160).

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(May 14, 2018)

  • v0.4.3(Feb 13, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.2

    • Fixed a regression: AttributeError when loading pickles that hold a reference to a dynamically defined class from the __main__ module. (issue #131).
    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. (issue #144)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Oct 26, 2017)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.0

    • Fixed a crash when pickling dynamic classes whose __dict__ attribute was defined as a property. Most notably, this affected dynamic namedtuples in Python 2. (https://github.com/cloudpipe/cloudpickle/pull/113)
    • Cloudpickle now preserves the __module__ attribute of functions (https://github.com/cloudpipe/cloudpickle/pull/118/).
    • Fixed a crash when pickling modules that don't have a __package__ attribute (https://github.com/cloudpipe/cloudpickle/pull/116).
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 9, 2017)

    Get it while it's briny with

    pip install cloudpickle
    

    Ch-ch-ch-changes

    • Fix functions with empty cells (https://github.com/cloudpipe/cloudpickle/pull/91)
    • Allow pickling Logger objects (https://github.com/cloudpipe/cloudpickle/pull/96)
    • Fix crash when pickling dynamic class cycles (https://github.com/cloudpipe/cloudpickle/pull/102)
    • Support WeakSets and ABCMeta instances (https://github.com/cloudpipe/cloudpickle/pull/104)
    • Ignore "None" modules added to sys.modules (https://github.com/cloudpipe/cloudpickle/pull/107)
    • Remove non-standard __transient__ support (https://github.com/cloudpipe/cloudpickle/pull/110)
    • Catch exception from pickle.whichmodule() (https://github.com/cloudpipe/cloudpickle/pull/112)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(May 31, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes since v0.2.2

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 30, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 5, 2015)

    cloudpickle bug fix release v0.1.1

    • fixed save_classmethod (#41)
    • now allows users to import cloudpickle to dump and load pickled data (#37)
    • no more pickling of closed files, was broken on Python 3 (#32)
    • more tests!
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Apr 16, 2015)

Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

null 1k May 20, 2022
MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

MessagePack 1.6k May 24, 2022
Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

null 3.7k May 27, 2022
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

null 1.5k May 26, 2022
Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

Ilya Etingof 209 May 5, 2022
serialize all of python

dill serialize all of python About Dill dill extends python's pickle module for serializing and de-serializing python objects to the majority of the b

The UQ Foundation 1.7k May 20, 2022
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

null 3.3k May 26, 2022
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

srsly: Modern high-performance serialization utilities for Python This package bundles some of the best Python serialization libraries into one standa

Explosion 264 May 19, 2022
Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins <[email protected]> Lele Gaifax <[email protected]> License: MIT License Sta

null 454 May 24, 2022
Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

Tyler Kennedy 534 May 27, 2022
A Python pickling decompiler and static analyzer

Fickling Fickling is a decompiler, static analyzer, and bytecode rewriter for Python pickle object serializations. Pickled Python objects are in fact

Trail of Bits 119 May 14, 2022
Buildout is a deployment automation tool written in and extended with Python

Buildout Buildout is a project designed to solve 2 problems: Application-centric assembly and deployment Assembly runs the gamut from stitching togeth

buildout 535 May 10, 2022
Extended refactoring capabilities for Python LSP Server using Rope.

pylsp-rope Extended refactoring capabilities for Python LSP Server using Rope. This is a plugin for Python LSP Server, so you also need to have it ins

null 14 May 1, 2022
Python AVL Protocols Server for Codec 8 and Codec 8 Extended Protocols

pycodecs Package provides python AVL Protocols Server for Codec 8 and Codec 8 Extended Protocols This package will parse the AVL Data and log it in hu

Vardharajulu K N 1 Jan 27, 2022
On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Georgetown Information Retrieval Lab 76 Apr 19, 2022
SSRF search vulnerabilities exploitation extended.

This tool search for SSRF using predefined settings in different parts of a request (path, host, headers, post and get parameters).

Andri Wahyudi 13 Jul 4, 2021
Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

Renat Bashirov 86 May 12, 2022
An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Neradoc 5 May 1, 2022
Generalized hybrid model for mode-locked laser diodes with an extended passive cavity

GenHybridMLLmodel Generalized hybrid model for mode-locked laser diodes with an extended passive cavity This hybrid simulation strategy combines a tra

Stijn Cuyvers 1 Oct 27, 2021
mirage ~ ♪ extended django admin or manage.py command.

mirage ~ ♪ extended django admin or manage.py command. ⬇️ Installation Installing Mirage with Pipenv is recommended. pipenv install -d mirage-django-l

Shota Shimazu 6 Feb 14, 2022
Extended functionality for Namebase past their web UI

Namebase Extended Extended functionality for Namebase past their web UI.

RunDavidMC 8 Apr 8, 2022
A micro-service that can be extended to help in monitoring systems

A micro-service that can be extended to help in monitoring systems. Be extensible to be incorporated in any of the systems to facilitate timely interventions.

Peter Kagwe 1 Feb 6, 2022
An extended, game oriented, turtle

Burtle A Better TURTLE. Makes making games easier. write less do more!! Documentation & guide: https://alannxq.github.io/burtle/ Installation pip inst

null 5 May 19, 2022
Stevan KZ 1 Oct 27, 2021
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Sehath Perera 1 Jan 8, 2022
Development tool to measure, monitor and analyze the memory behavior of Python objects in a running Python application.

README for pympler Before installing Pympler, try it with your Python version: python setup.py try If any errors are reported, check whether your Pyt

null 908 May 15, 2022
SQLModel is a library for interacting with SQL databases from Python code, with Python objects.

SQLModel is a library for interacting with SQL databases from Python code, with Python objects. It is designed to be intuitive, easy to use, highly compatible, and robust.

Sebastián Ramírez 7.4k May 25, 2022