Python client for using Prefect Cloud with Saturn Cloud

Overview

prefect-saturn

GitHub Actions PyPI Version

prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tutorial, see "Fault-Tolerant Data Pipelines with Prefect Cloud ".

Installation

prefect-saturn is available on PyPi.

pip install prefect-saturn

prefect-saturn can be installed directly from GitHub

pip install git+https://github.com/saturncloud/[email protected]

Getting Started

prefect-saturn is intended for use inside a Saturn Cloud environment, such as a Jupyter notebook.

import prefect
from prefect import Flow, task
from prefect_saturn import PrefectCloudIntegration


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("hello prefect-saturn")


flow = Flow("sample-flow", tasks=[hello_task])

project_name = "sample-project"
integration = PrefectCloudIntegration(
    prefect_cloud_project_name=project_name
)
flow = integration.register_flow_with_saturn(flow)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Customize Dask

You can customize the size and behavior of the Dask cluster used to run prefect flows. prefect_saturn.PrefectCloudIntegration.register_flow_with_saturn() accepts to arguments to accomplish this:

For example, the code below tells Saturn that this flow should run on a Dask cluster with 3 xlarge workers, and that prefect should shut down the cluster once the flow run has finished.

flow = integration.register_flow_with_saturn(
    flow=flow,
    dask_cluster_kwargs={
        "n_workers": 3,
        "worker_size": "xlarge",
        "autoclose": True
    }
)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Contributing

See CONTRIBUTING.md for documentation on how to test and contribute to prefect-saturn.

Comments
  • [CU-feu7x7] saturn labels

    [CU-feu7x7] saturn labels

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Automatically add saturn-specific labels to the flow environment.

    How does this PR improve prefect-saturn?

    Along with the related change in saturn, this allows flows to be assigned to the correct cluster agent even if there are agents in multiple clusters that are all using the same prefect-cloud tenant.

    opened by bhperry 7
  • Bump development version

    Bump development version

    • [X] passes make lint
    • [X] adds tests to tests/ (if appropriate)

    What does this PR change?

    • Bumps version to 0.5.1.9000 for development

    How does this PR improve prefect-saturn?

    With this change, users can rely on the behavior that installations from source control will have a newer version than the newest release available on PyPI, but guaranteed to be older than the next release on PyPI.

    opened by dotNomad 6
  • [CU-frwyvw] Set flow instance size

    [CU-frwyvw] Set flow instance size

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Adds instance size argument for flow node

    How does this PR improve prefect-saturn?

    Enables users to select a larger node size for their flow to run on. This means we are not forcing users to farm all of the work out to dask-clusters, they can instead run flows with a local executor and not be constrained by a medium tier node.

    opened by bhperry 4
  • pickle is a bad choice for hashing.

    pickle is a bad choice for hashing.

    Without this PR, modifying the python version could lead to different hashes for flow metadata. This PR hashes identifying information for flows using json, rather than pickle. This PR does change the hashing function. As a result re-registering a flow will create a new flow in Saturn Cloud.

    opened by hhuuggoo 3
  • add expectation that BASE_URL does not end in a slash

    add expectation that BASE_URL does not end in a slash

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Changes PrefectCloudIntegration to expect the environment BASE_URL to NOT end in a trailing slash. In all Saturn production environments, BASE_URL does not end in a trailing slash.

    This PR also bumps the version of prefect-saturn to 0.2.0, since this change in behavior is a breaking change.

    How does this PR improve prefect-saturn?

    Without this fix, prefect-saturn is currently broken in Saturn production environments.

    Why can't we just keep the code that adds or removes slashes as needed?

    This project uses prefect's WebhookStorage (https://docs.prefect.io/orchestration/execution/storage_options.html#webhook). That type of storage uses template strings to reference environment variables, like this:

    flow = Flow(
        "some-flow",
        storage=Webhook(
            build_request_kwargs={
                "url": "${BASE_URL}/api/whatever",
            },
            build_request_http_method="POST",
           ....
        )
    )
    

    There is not place in there where we can introduce Saturn-written code that checks the environment variable BASE_URL and deals with missing or extra trailing slashes. So an assumption about whether or not the environment variable BASE_URL ends in a slash has to be hard-coded into the codebase here.

    opened by jameslamb 2
  • Fix tenant_id attr for Prefect 15

    Fix tenant_id attr for Prefect 15

    • [x] passes make lint
    • [ ] adds tests to tests/ (if appropriate)

    I don't believe any tests need to be added.

    What does this PR change?

    This PR changes the Client()._active_tenant_id > Client().tenant_id (when using Prefect >= 0.15) since _active_tenant_id is no longer a property of Prefect's Client in 0.15.

    How does this PR improve prefect-saturn?

    This allows prefect-saturn to work with prefect in version 0.15 which removed the private attribute.

    opened by dotNomad 1
  • [DEV-1227] Replace pyyaml

    [DEV-1227] Replace pyyaml

    Unlike PyYAML, ruamel.yaml supports:

    • YAML <= 1.2. PyYAML only supports YAML <= 1.1 This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
    • Roundtrip preservation When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():

    See more details at https://yaml.readthedocs.io/en/latest/pyyaml.html

    opened by wreis 1
  • Switch from Docker storage to Webhook storage

    Switch from Docker storage to Webhook storage

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR replaces Docker storage with Webhook storage.

    This PR also adds a small working example to the README, to show users how it works. A longer-form tutorial will be up on https://www.saturncloud.io/docs/ some time in the next week.

    How does this PR improve prefect-saturn?

    The use of Webhook storage will make the integration between Prefect Cloud and Saturn Cloud much faster and less error-prone. It eliminates several hacks and special cases, should be much quicker, and removes an awkward race condition. With Docker storage, after calling .register_flow_with_saturn() you had to wait for a k8s job that built and pushed the image to complete. That could take up to 10 minutes, and nothing in prefect-saturn or the Saturn UI allowed you to see logs or other progress of that job.

    That job was also very brittle...it required docker-in-docker trickery, patching user-chosen images with several build-time-only dependencies, and running a sequence of multiple gnarly shell scripts.

    With Webhook storage, all of that complexity is eliminated. When you call .register_flow_with_saturn(), the flow is serialized with cloudpickle, sent to Saturn, and stored as bytes in an object store. At run time, when flow.storage.get_flow() is called, Saturn retrieves the binary content of the flow from an object store and sends it back over the wire. That's it! Everything is synchronous, so no weird race conditions, and it's just passing bytes around, so the storage process goes from 10+ minutes to under a second.

    opened by jameslamb 1
  • Add Saturn project details and remove unnecessary stuff

    Add Saturn project details and remove unnecessary stuff

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    • removes fields from saturn_details that are unnecessary as of #8
    • renames saturn_details to storage_details since that now only contains information for building storage
    • bump version floor to Python 3.7 (all the Saturn images are Python 3.7)
    • set ignore_healthchecks=True on the storage object.
      • This avoids a weird error where the image being built with flow.storage.build() doesn't have stuff like the Saturn start script in it, which could cause errors.
      • We don't need to care about that because every job in the flow's lifecycle will have the start script added to it's command / args

    How does this PR improve prefect-saturn?

    This PR removes unnecessary fields that could have become dependencies in users' code, reducing the surface for breaking changes.

    The ignore_healthchecks thing makes it possible for users to rely on the start script to install libraries.

    opened by jameslamb 1
  • change strategy for identifying flows

    change strategy for identifying flows

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR changes the strategy for uniquely identifying flows. Now the flow_hash sent to Saturn Cloud is the sha256 hash of:

    • project name
    • flow name
    • Prefect Cloud tenant id

    This means that flow_hash is equivalent to the Prefect Cloud concept flow_group_id. This PR proposes creating this hash ourselves because Prefect Cloud's flow_group_id can't be known until you've registered a flow with Prefect Cloud, and prefect-saturn needs to register with Saturn first.

    How does this PR improve prefect-saturn?

    This PR provides a reliable way to uniquely identify all versions of the same flow. It improves on the previous model, where tenant id was not considered. The previous model could have caused conflicts in the case where two flows with identical names and code, in Prefect Cloud projects with the same name but in different tenants, would get the same hash and conflict.

    Because this PR no longer considers the task graph in the hash, it also means that the hash will not change as a flow's task graph changes. That means pushing Docker storage to a container registry should be a lot faster, since it'll be more likely to hit the registry's cache.

    Notes for Reviewers

    I had to introduce prefect.client.Client in this PR, and then mock it with patch(). A lot of the diff in the test files is just whitespace, the result of adding in a with patch(....). I recommend reviewing with whitespace changes hidden.

    opened by jameslamb 1
  • Add more metadata to flows

    Add more metadata to flows

    This PR includes a few updates that improve execution and testing for the end-to-end experience with Prefect Cloud.

    What does this PR change?

    This PR adds more details from Saturn to the flow, so the agent running it knows how to configure the first job that loads the flow.

    How does this PR improve prefect-saturn?

    This PR allows flows executed from Prefect Cloud to take advantage of Saturn-y customization features like env secrets, filesystem secrets, and a custom start script.

    opened by jameslamb 1
Releases(v0.6.0)
  • v0.6.0(Nov 4, 2021)

    What's Changed

    • No default dask cluster by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/50
    • Add encoding kwarg to open by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/52

    New Contributors

    • @jsignell made their first contribution in https://github.com/saturncloud/prefect-saturn/pull/50

    Full Changelog: https://github.com/saturncloud/prefect-saturn/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2021)

  • v0.5.0(Apr 23, 2021)

    Breaking

    None

    Features

    • replace pyyaml with ruamel-yaml (#28)
    • add support for KubernetesRun "RunConfig", if using this library with prefect >= 0.13.10 (#34, #36, #37)
      • this does not break compatibility with prefect >0.13.0,<=0.13.9

    Bug Fixes

    • fix deprecation warnings from prefect 0.14.x (#32)
      • some modules were reorganized from prefect 0.13.x to 0.14.x, and using the 0.13.x-style paths raises deprecation warnings

    Docs

    • support Python 3.8 (#33)
      • this library was already compatible with Python 3.8, but that is now tested on every build and documented in the package classifiers
    • fix keywords in package metadata (#39)
      • this improves the discoverability of this project on PyPi
    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Dec 9, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • set an explicit default of autoclose = False for dask_cluster_kwargs (#25)
      • this ensures that, by default, flows registered with prefect-saturn leave their Dask cluster up at the end of execution
      • this avoids the risk of one flow run closing down a Dask cluster that is in use by another flow run
      • this was already prefect-saturn's behavior, but only indirectly because autoclose defaults to False in dask-saturn. Not that is directly the default in prefect-saturn.
    • add tests on describe_sizes() (#24)

    Docs

    • added more docs in the README on how to customize the Dask cluster used by DaskExecutor (#25)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Dec 9, 2020)

    Breaking

    None

    Features

    • You can now set the instance size for the node that runs flow.run() (#23)
      • PrefectCloudIntegration.register_flow_with_saturn() gets a new keyword argument `instance_size
      • use new function describe_sizes() to list the valid options

    Bug Fixes

    None

    Docs

    • added documentation on changing the size of the instance that a flow runs on
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 19, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • fix broken installations from source distribution (prefect-saturn-*.tar.gz) (#22)

    Docs

    • package LICENSE file with package artifacts (#22)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 6, 2020)

  • v0.4.0(Oct 21, 2020)

  • v0.3.0(Oct 16, 2020)

  • v0.2.0(Sep 18, 2020)

    Breaking

    • prefect-saturn now expects that the environment variable BASE_URL does not end in a slash (#17)

    Features

    None

    Bug Fixes

    None

    Docs

    None

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 31, 2020)

  • v0.1.0(Aug 13, 2020)

    Breaking

    • Moved .add_storage() and .add_environment() internal, and made .register_flow_with_saturn() do more. (#14) Now the interface is just like this:

      integration = PrefectCloudIntegration("some-project")
      flow.register_flow_with_saturn()
      flow.register(project_name="some-project", labels=["saturn-cloud"])
      
    • Replaced Docker storage with Webhook storage (#14)

    • Bumped prefect version floor to 0.13.0 (the first release that had Webhook) (#14)

    Features

    None

    Bug Fixes

    None

    Docs

    • Added a minimal working example in README (#14)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 23, 2020)

    • moved some details of building KubernetesJobEnvironment into Saturn's back-end and out of this library
    • removed unnecessary elements in saturn_details
    • renamed saturn_details to storage_details since it now only contains things needed for building storage
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 21, 2020)

Owner
Saturn Cloud
End-to-End Data Science in Python Featuring Dask and Jupyter
Saturn Cloud
An API that allows you to get full information about TikTok videos

TikTok-API An API that allows you to get full information about TikTok videos without using any third party sources and only the TikTok API. ##API onl

FC 13 Dec 20, 2021
Shedding a new skin on Dis-Snek's commands.

Molter - WIP Shedding a new skin on Dis-Snek's commands. Currently, its goals are to make message commands more similar to discord.py's message comman

Astrea 7 May 01, 2022
Super Fast Telegram UserBot Made With Python.

Description Super Fast Telegram UserBot Made With Python. LOGO Made With Support of All Userbots Dev's Dark-Venom is a Light-Weight Userbot. It's unde

2 Sep 14, 2021
A python script that can send notifications to your phone via SMS text

Discord SMS Notification A python script that help you send text message to your phone one of your desire discord channel have a new message. The proj

2 Apr 25, 2022
A Python Script to automate searching of available vaccination centers in the city and hence booking

Cowin Vaccine Availability Notifier Cowin Vaccine Availability Notifier takes your City or PIN code as an input and automatically notifies you via ema

Jayesh Padhiar 7 Sep 05, 2021
Python client for Arista eAPI

Arista eAPI Python Library The Python library for Arista's eAPI command API implementation provides a client API work using eAPI and communicating wit

Arista Networks EOS+ 124 Nov 23, 2022
A command line interface for accessing google drive

Drive Cli Get the ability to access Google Drive without leaving your terminal. Inspiration Google Drive has become a vital part of our day to day lif

Chirag Shetty 538 Dec 12, 2022
This software's intent is to automate all activities related to manage Axie Infinity Scholars. It is specially aimed to mangers with large scholar roasters.

Axie Scholars Utilities This software's intent is to automate all activities related to manage Scholars. It is specially aimed to mangers with large s

Ferran Marin 153 Nov 16, 2022
This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu

Remaded Discum Its not Official Discum Wrapper ! This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu Authors @merubokkusu (

discum-remaded 8 Aug 09, 2022
Py hec token mgr - Create HEC tokens in Cribl Stream through the API

Add HEC tokens via API calls This script is intended as an example of how to aut

Jon Rust 3 Mar 04, 2022
A Discord bot that may save your day by predicting it.

Sage A Discord bot that may save your day by predicting it.

1 Nov 17, 2022
📷 An Instagram bot written in Python using Selenium on Google Chrome

📷 An Instagram bot written in Python using Selenium on Google Chrome. It will go through posts in hashtag(s) and like and comment on them.

anniedotexe 47 Dec 19, 2022
This script will detect changes in your session using Discords built in Gateway.

Detect Session Gateway This script will detect changes in your session using Discords built in Gateway. What does this log? Discord build version Oper

Omega 5 Dec 18, 2021
A high level library for building Discord bots.

Qord A high level library for building Discord bots. 🚧 This library is currently in development. Questions that you are having What is this? This is

Izhar Ahmad 16 May 14, 2022
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

24 July 2020 Actively soliciting contributers! Ping @ronncc if you would like to help out! pytube pytube is a very serious, lightweight, dependency-fr

pytube 7.9k Jan 02, 2023
汪汪Bot是一个Telegram Bot,用于帮助你管理一台服务器上的Docker

WangWangBot 汪汪Bot是一个Telegram Bot,用于帮助你管理一台服务器上的Docker运行的Bot。这是使用视频: 部署说明 安装运行 准备 local.env 普通版本 BOT_TOKEN=你的BOT_TOKEN ADMINS=使用,分隔的管理员ID列表 如果你使用doppl

老房东的代码练习册 56 Aug 23, 2022
Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch.

AWS RoseTTAFold Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch. Overview Proteins are large biomolecules that play

AWS Samples 20 May 10, 2022
A Python wrapper for the DeepL API

deepl.py A Python wrapper for the DeepL API installing Install and update using pip: pip install deepl.py A simple example. # Sync Sample import deep

grarich 18 Dec 12, 2022
troposphere - Python library to create AWS CloudFormation descriptions

troposphere - Python library to create AWS CloudFormation descriptions

4.8k Jan 06, 2023
A python based Telegram Bot for Compressing Videos with negligible Quality change

𝕍𝕚𝕕𝕖𝕠 ℂ𝕆𝕄ℙℝ𝔼𝕊𝕊𝕆ℝ 𝔹𝕆𝕋 ᴍᴜʟᴛɪғᴜɴᴄᴛɪᴏɴ ǫᴜᴀʟɪᴛʏ ᴄᴏᴍᴘʀᴇssᴏʀ A Telegram Video CompressorBot it compress videos with negligible Quality change.

Danish 154 Dec 04, 2022