A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Last update: Dec 03, 2022

Overview

MLOps

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Tools used:

Data Pipeline: Dagster
ML workflow: MLflow
API Deployment: FastAPI
Monitoring: ElasticAPM

Blog posts

Requirements

Poetry (dependency management)

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
$ poetry --version
# Poetry version 1.1.10

pre-commit (static code analysis)

$ pip install pre-commit
$ pre-commit --version
# pre-commit 2.15.0

Minio (s3 compatible object storage)

Follow the instructions here - https://min.io/download

Setup

Environment setup

$ poetry install

MLflow

$ poetry shell
$ export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
$ export AWS_ACCESS_KEY_ID=minioadmin
$ export AWS_SECRET_ACCESS_KEY=minioadmin

# make sure that the backend store and artifact locations are same in the .env file as well
$ mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root s3://mlflow \
    --host 0.0.0.0

Minio

$ export MINIO_ROOT_USER=minioadmin
$ export MINIO_ROOT_PASSWORD=minioadmin

$ mkdir minio_data
$ minio server minio_data --console-address ":9001"

# API: http://192.168.29.103:9000  http://10.119.80.13:9000  http://127.0.0.1:9000
# RootUser: minioadmin
# RootPass: minioadmin

# Console: http://192.168.29.103:9001 http://10.119.80.13:9001 http://127.0.0.1:9001
# RootUser: minioadmin
# RootPass: minioadmin

# Command-line: https://docs.min.io/docs/minio-client-quickstart-guide
#    $ mc alias set myminio http://192.168.29.103:9000 minioadmin minioadmin

# Documentation: https://docs.min.io

Go to http://127.0.0.1:9001/buckets/ and create a bucket called mlflow.

Dagster

$ poetry shell
$ dagit -f mlops/pipeline.py

ElasticAPM

$ docker-compose -f docker-compose-monitoring.yaml up

FastAPI

$ poetry shell
$ export PYTHONPATH=.
$ python mlops/app/application.py

TODO

Setup with docker-compose.
Load testing.
Test cases.
CI/CD pipeline.
Drift detection.

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Related tags

Overview

MLOps

Requirements

Poetry (dependency management)

pre-commit (static code analysis)

Minio (s3 compatible object storage)

Setup

Environment setup

MLflow

Minio

Dagster

ElasticAPM

FastAPI

TODO

Owner

Utsav

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts.

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

flexible time-series processing & feature extraction

Firebase + Cloudrun + Machine learning

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

Data Version Control or DVC is an open-source tool for data science and machine learning projects

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.

MasTrade is a trading bot in baselines3,pytorch,gym

Management of exclusive GPU access for distributed machine learning workloads

A quick reference guide to the most commonly used patterns and functions in PySpark SQL

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

Sequence learning toolkit for Python

An open-source library of algorithms to analyse time series in GPU and CPU.

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python