AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Last update: Jan 03, 2023

Related tags

Machine Learning oreilly_book

Overview

Data Science on AWS - O'Reilly Book

Get the book on Amazon.com

Book Outline

Quick Start Workshop (4-hours)

In this quick start hands-on workshop, you will build an end-to-end AI/ML pipeline for natural language processing with Amazon SageMaker. You will train and tune a text classifier to predict the star rating (1 is bad, 5 is good) for product reviews using the state-of-the-art BERT model for language representation. To build our BERT-based NLP text classifier, you will use a product reviews dataset where each record contains some review text and a star rating (1-5).

Quick Start Workshop Learning Objectives

Attendees will learn how to do the following:

Ingest data into S3 using Amazon Athena and the Parquet data format
Visualize data with pandas, matplotlib on SageMaker notebooks
Detect statistical data bias with SageMaker Clarify
Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
Store and share features using SageMaker Feature Store
Train and evaluate a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
Evaluate the model using SageMaker Processing Jobs
Track model artifacts using Amazon SageMaker ML Lineage Tracking
Run model bias and explainability analysis with SageMaker Clarify
Register and version models using SageMaker Model Registry
Deploy a model to a REST endpoint using SageMaker Hosting and SageMaker Endpoints
Automate ML workflow steps by building end-to-end model pipelines using SageMaker Pipelines

Extended Workshop (8-hours)

In the extended hands-on workshop, you will get hands-on with advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and auto-scaling. You will also setup a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection.

Extended Workshop Learning Objectives

Attendees will learn how to do the following:

Perform automated machine learning (AutoML) to find the best model from just your dataset with low-code
Find the best hyper-parameters for your custom model using SageMaker Hyper-parameter Tuning Jobs
Deploy multiple model variants into a live, production A/B test to compare online performance, live-shift prediction traffic, and autoscale the winning variant using SageMaker Hosting and SageMaker Endpoints
Setup a streaming analytics and continuous machine learning application using Amazon Kinesis and SageMaker

Workshop Instructions

Amazon SageMaker Studio Lab is a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

1. Request Amazon SageMaker Studio Lab Account

Go to Amazon SageMaker Studio Lab, and request a free acount by providing a valid email address.

Note that Amazon SageMaker Studio Lab is currently in public preview. The number of new account registrations will be limited to ensure a high quality of experience for all customers.

2. Create Studio Lab Account

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page.

You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn't require you to provide any billing information.

3. Sign in to your Studio Lab Account

You are now ready to sign in to your account.

4. Select your Compute instance, Start runtime, and Open project

CPU Option

Select CPU as the compute type and click Start runtime.

Once the Status shows Running, click Open project

5. Launch a New Terminal within Studio Lab

6. Clone this GitHub Repo in the Terminal

Within the Terminal, run the following:

cd ~ && git clone https://github.com/data-science-on-aws/oreilly_book

7. Create `data_science_on_aws` Conda kernel

Within the Terminal, run the following:

cd ~/oreilly_book/ && conda env create -f environment.yml || conda env update -f environment.yml && conda activate data_science_on_aws

If you see an error like the following, just ignore it. This will appear if you already have an existing Conda environment with this name. In this case, we will update the environment.

CondaValueError: prefix already exists: /home/studio-lab-user/.conda/envs/data_science_on_aws

8. Start the Workshop!

Navigate to oreilly_book/00_quickstart/ in SageMaker Studio Lab and start the workshop!

You may need to refresh your browser if you don't see the new oreilly_book/ directory.

When you open the notebooks, make sure to select the data_science_on_aws kernel.

AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Related tags

Overview

Data Science on AWS - O'Reilly Book

Get the book on Amazon.com

Book Outline

Quick Start Workshop (4-hours)

Quick Start Workshop Learning Objectives

Extended Workshop (8-hours)

Extended Workshop Learning Objectives

Workshop Instructions

1. Request Amazon SageMaker Studio Lab Account

2. Create Studio Lab Account

3. Sign in to your Studio Lab Account

4. Select your Compute instance, Start runtime, and Open project

CPU Option

5. Launch a New Terminal within Studio Lab

6. Clone this GitHub Repo in the Terminal

7. Create data_science_on_aws Conda kernel

8. Start the Workshop!

Owner

Data Science on AWS

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Machine Learning approach for quantifying detector distortion fields

Anomaly Detection and Correlation library

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Book Item Based Collaborative Filtering

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

This is a public repo where code samples are stored for the book Practical MLOps.

Tribuo - A Java machine learning library

distfit - Probability density fitting

CobraML: Completely Customizable A python ML library designed to give the end user full control

NumPy-based implementation of a multilayer perceptron (MLP)

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

hgboost - Hyperoptimized Gradient Boosting

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

A modular active learning framework for Python

Data Efficient Decision Making

Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

The Fuzzy Labs guide to the universe of open source MLOps

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

7. Create `data_science_on_aws` Conda kernel