A simple guide to MLOps through ZenML and its various integrations.

Last update: Dec 27, 2022

Overview

ZenBytes

Join our

Slack Community and become part of the ZenML family

Give the main ZenML repo a

GitHub star to show your love

ZenBytes is a series of practical lessons about MLOps through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also practitioners specifically looking to learn more about ZenML.

🙏 About ZenML

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. The ZenML repository and Docs has more details.

ZenML is a good tool to learn MLOps because of two reasons:

🔹 ZenML focuses on being un-opinionated about underlying tooling and infrastructure across the MLOps stack. 🔹 ZenML presents itself as a pipeline tool, making all development in ZenML data-centric rather than model-centric.

🧱 Structure of Lessons

The lessons are structured in Chapters. Each chapter is a notebook that walks through and explains various concepts:

Chapter 0: Basics
Chapter 1: Building a ML(Ops) pipeline
Chapter 2: Transitioning across stacks
Coming soon: More chapters

💻 System Requirements

In order to run these lessons, you need to have some packages installed on your machine. Note you only need these for some parts, and you might get away with only Python and pip install requirements.txt for some parts of the codebase, but we recommend installing all these:

Currently, this will only run on UNIX systems.

package	MacOS installation	Linux installation
docker	Docker Desktop for Mac	Docker Engine for Linux
kubectl	kubectl for mac	kubectl for linux
k3d	Brew Installation of k3d	k3d installation linux

You might also need to install Anaconda to get the MLflow deployment to work.

🐍 Python Requirements

Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:

git clone https://github.com/zenml-io/zenbytes
pip install -r requirements.txt

If you are running the run.py script, you will also need to install some integrations using zenml:

zenml integration install sklearn -f
zenml integration install dash -f
zenml integration install evidently -f
zenml integration install mlflow -f
zenml integration install kubeflow -f
zenml integration install seldon -f

📓 Diving into the code

We're ready to go now. You can go through the notebook step-by-step guide:

jupyter notebook

🏁 Cleaning up when you're done

Once you are done running all notebooks you might want to stop all running processes. For this, run the following command. (This will tear down your k3d cluster and the local docker registry.)

zenml stack set aws_kubeflow_stack
zenml stack down -f
zenml stack set local_kubeflow_stack
zenml stack down -f

❓ FAQ

MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available. OSError: [Errno 48] Address already in use

Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.

A simple guide to MLOps through ZenML and its various integrations.

Related tags

Overview

ZenBytes

🙏 About ZenML

🧱 Structure of Lessons

💻 System Requirements

🐍 Python Requirements

📓 Diving into the code

🏁 Cleaning up when you're done

❓ FAQ

Owner

ZenML

About Solve CTF offline disconnection problem - based on python3's small crawler

Coursera Machine Learning - Python code

Price forecasting of SGB and IRFC Bonds and comparing there returns

Apple-voice-recognition - Machine Learning

A classification model capable of accurately predicting the price of secondhand cars

A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Banpei is a Python package of the anomaly detection.

cuML - RAPIDS Machine Learning Library

Time series changepoint detection

Official code for HH-VAEM

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

ETNA is an easy-to-use time series forecasting framework.

Tools for Optuna, MLflow and the integration of both.

Implementation of different ML Algorithms from scratch, written in Python 3.x

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

(3D): LeGO-LOAM, LIO-SAM, and LVI-SAM installation and application

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Napari sklearn decomposition

The Emergence of Individuality