AI and Machine Learning workflows on Anthos Bare Metal.

Overview

Hybrid and Sovereign AI on Anthos Bare Metal

Table of Contents

Overview

AI and Machine Learning workflows using TensorFlow on Anthos Bare Metal. TensorFlow is one of the most popular ML frameworks (10M+ downloads per month) in use today, but at the same time presents a lot of challenges when it comes to setup (GPUs, CUDA Drivers, TF Serving etc), performance tuning, cluster provisioning, maintenance, and model serving. This work will showcase the easy to use guides for ML model serving, training, infrastructure, ML Notebooks, and more on Anthos Bare Metal.

Terraform as IaC Substrate

Terraform is an open-source infrastructure as code software tool, and one of the ways in which Enterprise IT teams create, manage, and update infrastructure resources such as physical machines, VMs, switches, containers, and more. Provisioning the hardware or resources is always the first step in the process and these guides will be using Terraform as a common substrate to create the infrastructure for AI/ML apps. Checkout our upstream contribution to the Google Terraform Provider for GPU support in the instance_template module.

Serving TensorFlow ResNet Model on ABM

In this installation you'll see how to create an end-to-end TensorFlow ML serving ResNet installation on ABM using Google Compute Engine. Once the setup is completed, you'll be able to send image classification requests using GRPC client to ABM ML Serving cluster.

Requirements

  • Google Cloud Platform access and install gcloud SDK
  • Service Account JSON
  • Terraform, Git, Container Image

ResNet SavedModel Image on GCR

Let's create a local directory and download the Deep residual network (ResNet) model.

rm -rf /tmp/resnet
mkdir /tmp/resnet
curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz

Verify the SavedModel

ls /tmp/resnet/*
saved_model.pb variables

Now we will commit the ResNet serving docker image:

docker run -d --name serving_base tensorflow/serving
docker cp /tmp/resnet serving_base:/models/resnet
docker commit --change "ResNet model" serving_base $USER/resnet_serving
docker kill serving_base
docker rm serving_base

Copy the local docker image to gcr.io

export GCR_IMAGE_PATH="gcr.io/$GCP_PROJECT/abm_serving/resnet"
docker tar $USER/resnet_serving $GCR_IMAGE_PATH
docker push $GCR_IMAGEPATH

ABM GCE Cluster using Terraform

Create GCE demo host and perform few steps to setup the host:

export SERVICE_ACCOUNT_FILE=<FILE_LOCATION>

export DEMO_HOST="abm-demo-host-live"
gcloud compute instances create $DEMO_HOST --zone=us-central1-a
gcloud compute scp $SERVICE_ACCOUNT_FILE $USER@$DEMO_HOST:

Perform ssh login into the demo machine and follow steps below:

gcloud compute ssh $DEMO_HOST --zone=us-central1-a

# Activate Service Account
gcloud auth activate-service-account --key-file=$SERVICE_ACCOUNT_FILE

# Install Git
sudo apt-get install git

# Install Terraform
# v0.14.10
export TERRAFORM_VERSION="0.14.10"

List current Anthos/GKE clusters using hub membership. You can list existing clusters and compare it with newly created clusters.

# List Anthos BM clusters
gcloud container hub memberships list

Install Terraform, and make few minor changes to configuration files:

# Remove any previous versions. You can skip if this is a new instance
sudo apt remove terraform

sudo apt-get install software-properties-common

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform=$TERRAFORM_VERSION

terraform -version

Let's setup some ABM infrastructure on GCE using Terraform

# Git clone ABM Terraform setup
git clone https://github.com/GoogleCloudPlatform/anthos-samples.git
cd anthos-samples
git checkout abm-gcp-tf-demo
cd anthos-bm-gcp-terraform

# Make changes to cluster names and few edits
cp terraform.tfvars.sample terraform.tfvars

Make edits to the variables.tf and terraform.tfvars and also make sure the abm_cluster_id is modified to a unique name

# Change abm_cluster_id and service account name in variables.tf
export CLUSTER_ID=`echo "abm-tensorflow-"$(date +"%m%d%H%M")`
echo $CLUSTER_ID

Create GCE resources using Terraform and verify

# Terraform init and apply
terraform init && terraform plan
terraform apply

# Verify resources using gcloud
gcloud compute instancs list

# Let's create cluster using bmctl and perform pre-flight checks and verify
export KUBECONFIG=$HOME/bmctl-workspace/$CLUSTER_ID/$CLUSTER_ID-kubeconfig

# List ABM clusters
gcloud container hub memberships list

# Listing the details of live-cluster
gcloud container hub memberships describe $LIVE_CLUSTER_NAME

Verify k8s cluster details and check few outputs

kubectl get nodes
kubectl get deployments
kubectl get pods

TensorFlow ResNet model service on ABM Cluster

git clone https://github.com/GoogleCloudPlatform/anthos-ai
cd anthos-ai

kubectl create -f serving/resnet_k8s.yaml

# Let's view deployments and pods
kubectl get deployments
kubectl get pods

kubectl get services
kubectl describe service resnet-abm-service

# Let's send prediction request to ResNet service on ABM
git clone https://github.com/puneith/serving.git
sudo tools/run_in_docker.sh python tensorflow_serving/example/resnet_client_grpc.py $IMAGE_URL --server=10.200.0.51:8500

Return to the demo host and then destroy the demo host

# Destroy resources and demo host
terraform destroy

gcloud compute instances delete $DEMO_HOST
Owner
Google Cloud Platform
Google Cloud Platform
AudioCLIP Extending CLIP to Image, Text and Audio

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

Emily 68 Jan 07, 2023
jiant is an NLP toolkit

๐Ÿšจ Update ๐Ÿšจ : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,

MLยฒ AT CILVR 1.5k Dec 28, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Jungil Kong 1.1k Jan 02, 2023
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 2k Dec 27, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 04, 2021
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 08, 2022
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 02, 2023
OCR์„ ์ด์šฉํ•˜์—ฌ ์ธ์›์ˆ˜๋ฅผ ์ธ์‹ ํ›„ ์คŒ์„ Kill ํ•ด์ค๋‹ˆ๋‹ค

How To Use killtheZoom-2.0 Windows 0. https://joyhong.tistory.com/79 ์ด ๊ธ€์„ ๋ณด๋ฉด์„œ tesseract๋ฅผ C:\Program Files\Tesseract-OCR ๊ฒฝ๋กœ๋กœ ์„ค์น˜ํ•ด์ฃผ์„ธ์š”(ํ•œ๊ตญ์–ด ์–ธ์–ด ์ถ”๊ฐ€ ํ•„์š”) ์ƒ๋‹จ์˜ ์ดˆ

๊น€์ •์ธ 9 Sep 13, 2021
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
A Python script which randomly chooses and prints a file from a directory.

___ ____ ____ _ __ ___ / _ \ | _ \ | _ \ ___ _ __ | '__| / _ \ | |_| || | | || | | | / _ \| '__| | | | __/ | _ || |_| || |_| || __

yesmaybenookay 0 Aug 06, 2021
Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer. Designed for use with Argos Translate and LibreTranslate.

Argos Open Tech 61 Dec 13, 2022
100+ Chinese Word Vectors ไธŠ็™พ็ง้ข„่ฎญ็ปƒไธญๆ–‡่ฏๅ‘้‡

Chinese Word Vectors ไธญๆ–‡่ฏๅ‘้‡ ไธญๆ–‡ This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

[UPDATED] A TensorFlow Implementation of Attention Is All You Need When I opened this repository in 2017, there was no official code yet. I tried to i

Kyubyong Park 3.8k Dec 26, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

Dibya Chakravorty 570 Dec 31, 2022
Train BPE with fastBPE, and load to Huggingface Tokenizer.

BPEer Train BPE with fastBPE, and load to Huggingface Tokenizer. Description The BPETrainer of Huggingface consumes a lot of memory when I am training

Lizhuo 1 Dec 23, 2021
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 03, 2023
Machine learning classifiers to predict American Sign Language .

ASL-Classifiers American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United Stat

Tarek idrees 0 Feb 08, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022