This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Overview

MLOps with Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

MLOps lifecycle

Getting started

  1. Setup your MLOps environment on Google Cloud.

  2. Start your AI Notebook instance.

  3. Open the JupyterLab then open a new Terminal

  4. Clone the repository to your AI Notebook instance:

    git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
    cd mlops-with-vertex-ai
    
  5. Install the required Python packages:

    pip install tfx==1.2.0 --user
    pip install -r requirements.txt
    

    NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.


  6. Upgrade the gcloud components:

    sudo apt-get install google-cloud-sdk
    gcloud components update
    

Dataset Management

The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

  1. Performing exploratory data analysis on the data in BigQuery.
  2. Creating Vertex AI Dataset resource using the Python SDK.
  3. Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

  1. Preparing the data using Dataflow.
  2. Implementing a Keras classification model.
  3. Training the Keras model with Vertex AI using a pre-built container.
  4. Upload the exported model from Cloud Storage to Vertex AI.
  5. Extract and visualize experiment parameters from Vertex AI Metadata.
  6. Use Vertex AI for hyperparameter tuning.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:

  1. Clone the repository to the build environment.
  2. Run unit tests.
  3. Run a local e2e test of the TFX pipeline.
  4. Build the ML container image for pipeline steps.
  5. Compile the pipeline.
  6. Upload the pipeline to Cloud Storage.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism. The Cloud Function listens to the Pub/Sub topic, and runs the training pipeline given a message sent to the Pub/Sub topic. The Cloud Function is implemented in src/pipeline_triggering.

The 05-continuous-training notebook covers:

  1. Creating a Cloud Pub/Sub topic.
  2. Deploying a Cloud Function.
  3. Triggering the pipeline.

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

  1. Receive hyper-parameters using hyperparam_gen custom python component.
  2. Extract data from BigQuery using BigQueryExampleGen component.
  3. Validate the raw data using StatisticsGen and ExampleValidator component.
  4. Process the data using on Dataflow Transform component.
  5. Train a custom model with Vertex AI using Trainer component.
  6. Evaluate and validate the custom model using ModelEvaluator component.
  7. Save the blessed to model registry location in Cloud Storage using Pusher component.
  8. Upload the model to Vertex AI using vertex_model_pusher custom python component.

Model Deployment

The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:

  1. Test model interface.
  2. Create an endpoint in Vertex AI.
  3. Deploy the model to the endpoint.
  4. Test the Vertex AI endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

  1. Use the Vertex AI endpoint for online prediction.
  2. Use the Vertex AI uploaded model for batch prediction.
  3. Run the batch prediction using Vertex Pipelines.

Model Monitoring

After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:

  1. Set skew and drift threshold.
  2. Create a monitoring job for all the models under and endpoint.
  3. List the monitoring jobs.
  4. List artifacts produced by monitoring job.
  5. Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.


Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Google Cloud Platform
Google Cloud Platform
Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Self-Supervised Bug Detection and Repair This is the reference code to replicate the research in Self-Supervised Bug Detection and Repair in NeurIPS 2

Microsoft 85 Dec 24, 2022
A python3 tool to take a 360 degree survey of the RF spectrum (hamlib + rotctld + RTL-SDR/HackRF)

RF Light House (rflh) A python script to use a rotor and a SDR device (RTL-SDR or HackRF One) to measure the RF level around and get a data set and be

Pavel Milanes (CO7WT) 11 Dec 13, 2022
Hysterese plugin with two temperature offset areas

craftbeerpi4 plugin OffsetHysterese Temperatur-Steuerungs-Plugin mit zwei tempereaturbereich abhängigen Offsets. Installation sudo pip3 install https:

HappyHibo 1 Dec 21, 2021
Neural Cellular Automata + CLIP

🧠 Text-2-Cellular Automata Using Neural Cellular Automata + OpenAI CLIP (Work in progress) Examples Text Prompt: Cthulu is watching cthulu_is_watchin

Mainak Deb 21 Dec 19, 2022
Model Zoo of BDD100K Dataset

Model Zoo of BDD100K Dataset

ETH VIS Group 200 Dec 27, 2022
robomimic: A Modular Framework for Robot Learning from Demonstration

robomimic [Homepage]   [Documentation]   [Study Paper]   [Study Website]   [ARISE Initiative] Latest Updates [08/09/2021] v0.1.0: Initial code and pap

ARISE Initiative 178 Jan 05, 2023
The implementation of DeBERTa

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 06, 2023
Labels4Free: Unsupervised Segmentation using StyleGAN

Labels4Free: Unsupervised Segmentation using StyleGAN ICCV 2021 Figure: Some segmentation masks predicted by Labels4Free Framework on real and synthet

70 Dec 23, 2022
[SIGGRAPH Asia 2021] Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN [Paper] [Project Website] [Output resutls] Official Pytorch i

Badour AlBahar 215 Dec 17, 2022
Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

Devesh Bhura 0 Oct 21, 2022
LBK 26 Dec 28, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition.

AnimalAI 3 AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition. It aims to support AI research t

Matthew Crosby 58 Dec 12, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

Dr. Sander Ali Khowaja 1 May 08, 2022
Generative Adversarial Text to Image Synthesis

Text To Image Synthesis This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the pa

Hao 575 Jan 08, 2023
General neural ODE and DAE modules for power system dynamic modeling.

Py_PSNODE General neural ODE and DAE modules for power system dynamic modeling. The PyTorch-based ODE solver is developed based on torchdiffeq. Sample

14 Dec 31, 2022
Official PyTorch implementation of StyleGAN3

Modified StyleGAN3 Repo Changes Made tied to python 3.7 syntax .jpgs instead of .pngs for training sample seeds to recreate the 1024 training grid wit

Derrick Schultz (he/him) 83 Dec 15, 2022
Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

RSS 2020 - Online Domain Adaptation for Occupancy Mapping Repository for the paper "Online Domain Adaptation for Occupancy Mapping", Robotics: Science

Anthony 26 Sep 22, 2022
"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

XiangyuXu 193 Dec 05, 2022