cloudpickle makes it possible to serialize Python constructs not supported by the default
pickle module from the Python standard library.
cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.
Among other things,
cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the
__main__ module (for instance in a script, a shell or a Jupyter notebook).
Cloudpickle can only be used to send objects between the exact same version of Python.
cloudpickle for long-term object storage is not supported and strongly discouraged.
Security notice: one should only load pickle data from trusted sources as otherwise
pickle.load can lead to arbitrary code execution resulting in a critical security vulnerability.
The latest release of
cloudpickle is available from pypi:
pip install cloudpickle
Pickling a lambda expression:
>>> import cloudpickle >>> squared = lambda x: x ** 2 >>> pickled_lambda = cloudpickle.dumps(squared) >>> import pickle >>> new_squared = pickle.loads(pickled_lambda) >>> new_squared(2) 4
Pickling a function interactively defined in a Python shell session (in the
>>> CONSTANT = 42 >>> def my_function(data: int) -> int: ... return data + CONSTANT ... >>> pickled_function = cloudpickle.dumps(my_function) >>> depickled_function = pickle.loads(pickled_function) >>> depickled_function <function __main__.my_function(data:int) -> int> >>> depickled_function(43) 85
Running the tests
tox, to test run the tests for all the supported versions of Python and PyPy:
pip install tox tox
or alternatively for a specific environment:
tox -e py37
py.testto only run the tests for your current version of Python:
pip install -r dev-requirements.txt PYTHONPATH='.:tests' py.test
cloudpickle was initially developed by picloud.com and shipped as part of the client SDK.
A copy of
cloudpickle.py was included as part of PySpark, the Python interface to Apache Spark. Davies Liu, Josh Rosen, Thom Neale and other Apache Spark developers improved it significantly, most notably to add support for PyPy and Python 3.
The aim of the
cloudpickle project is to make that work available to a wider audience outside of the Spark ecosystem and to make it easier to improve it further notably with the help of a dedicated non-regression test suite.