AI and Machine Learning workflows on Anthos Bare Metal.

Overview

Hybrid and Sovereign AI on Anthos Bare Metal

Table of Contents

Overview

AI and Machine Learning workflows using TensorFlow on Anthos Bare Metal. TensorFlow is one of the most popular ML frameworks (10M+ downloads per month) in use today, but at the same time presents a lot of challenges when it comes to setup (GPUs, CUDA Drivers, TF Serving etc), performance tuning, cluster provisioning, maintenance, and model serving. This work will showcase the easy to use guides for ML model serving, training, infrastructure, ML Notebooks, and more on Anthos Bare Metal.

Terraform as IaC Substrate

Terraform is an open-source infrastructure as code software tool, and one of the ways in which Enterprise IT teams create, manage, and update infrastructure resources such as physical machines, VMs, switches, containers, and more. Provisioning the hardware or resources is always the first step in the process and these guides will be using Terraform as a common substrate to create the infrastructure for AI/ML apps. Checkout our upstream contribution to the Google Terraform Provider for GPU support in the instance_template module.

Serving TensorFlow ResNet Model on ABM

In this installation you'll see how to create an end-to-end TensorFlow ML serving ResNet installation on ABM using Google Compute Engine. Once the setup is completed, you'll be able to send image classification requests using GRPC client to ABM ML Serving cluster.

Requirements

  • Google Cloud Platform access and install gcloud SDK
  • Service Account JSON
  • Terraform, Git, Container Image

ResNet SavedModel Image on GCR

Let's create a local directory and download the Deep residual network (ResNet) model.

rm -rf /tmp/resnet
mkdir /tmp/resnet
curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz

Verify the SavedModel

ls /tmp/resnet/*
saved_model.pb variables

Now we will commit the ResNet serving docker image:

docker run -d --name serving_base tensorflow/serving
docker cp /tmp/resnet serving_base:/models/resnet
docker commit --change "ResNet model" serving_base $USER/resnet_serving
docker kill serving_base
docker rm serving_base

Copy the local docker image to gcr.io

export GCR_IMAGE_PATH="gcr.io/$GCP_PROJECT/abm_serving/resnet"
docker tar $USER/resnet_serving $GCR_IMAGE_PATH
docker push $GCR_IMAGEPATH

ABM GCE Cluster using Terraform

Create GCE demo host and perform few steps to setup the host:

export SERVICE_ACCOUNT_FILE=<FILE_LOCATION>

export DEMO_HOST="abm-demo-host-live"
gcloud compute instances create $DEMO_HOST --zone=us-central1-a
gcloud compute scp $SERVICE_ACCOUNT_FILE $USER@$DEMO_HOST:

Perform ssh login into the demo machine and follow steps below:

gcloud compute ssh $DEMO_HOST --zone=us-central1-a

# Activate Service Account
gcloud auth activate-service-account --key-file=$SERVICE_ACCOUNT_FILE

# Install Git
sudo apt-get install git

# Install Terraform
# v0.14.10
export TERRAFORM_VERSION="0.14.10"

List current Anthos/GKE clusters using hub membership. You can list existing clusters and compare it with newly created clusters.

# List Anthos BM clusters
gcloud container hub memberships list

Install Terraform, and make few minor changes to configuration files:

# Remove any previous versions. You can skip if this is a new instance
sudo apt remove terraform

sudo apt-get install software-properties-common

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform=$TERRAFORM_VERSION

terraform -version

Let's setup some ABM infrastructure on GCE using Terraform

# Git clone ABM Terraform setup
git clone https://github.com/GoogleCloudPlatform/anthos-samples.git
cd anthos-samples
git checkout abm-gcp-tf-demo
cd anthos-bm-gcp-terraform

# Make changes to cluster names and few edits
cp terraform.tfvars.sample terraform.tfvars

Make edits to the variables.tf and terraform.tfvars and also make sure the abm_cluster_id is modified to a unique name

# Change abm_cluster_id and service account name in variables.tf
export CLUSTER_ID=`echo "abm-tensorflow-"$(date +"%m%d%H%M")`
echo $CLUSTER_ID

Create GCE resources using Terraform and verify

# Terraform init and apply
terraform init && terraform plan
terraform apply

# Verify resources using gcloud
gcloud compute instancs list

# Let's create cluster using bmctl and perform pre-flight checks and verify
export KUBECONFIG=$HOME/bmctl-workspace/$CLUSTER_ID/$CLUSTER_ID-kubeconfig

# List ABM clusters
gcloud container hub memberships list

# Listing the details of live-cluster
gcloud container hub memberships describe $LIVE_CLUSTER_NAME

Verify k8s cluster details and check few outputs

kubectl get nodes
kubectl get deployments
kubectl get pods

TensorFlow ResNet model service on ABM Cluster

git clone https://github.com/GoogleCloudPlatform/anthos-ai
cd anthos-ai

kubectl create -f serving/resnet_k8s.yaml

# Let's view deployments and pods
kubectl get deployments
kubectl get pods

kubectl get services
kubectl describe service resnet-abm-service

# Let's send prediction request to ResNet service on ABM
git clone https://github.com/puneith/serving.git
sudo tools/run_in_docker.sh python tensorflow_serving/example/resnet_client_grpc.py $IMAGE_URL --server=10.200.0.51:8500

Return to the demo host and then destroy the demo host

# Destroy resources and demo host
terraform destroy

gcloud compute instances delete $DEMO_HOST
Owner
Google Cloud Platform
Google Cloud Platform
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 03, 2023
Repositório do trabalho de introdução a NLP

Trabalho da disciplina de BI NLP Repositório do trabalho da disciplina Introdução a Processamento de Linguagem Natural da pós BI-Master da PUC-RIO. Eq

Leonardo Lins 1 Jan 18, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Improving Transformer Models by Reordering their Sublayers This repository contains the code for running the character-level Sandwich Transformers fro

Ofir Press 53 Sep 26, 2022
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

EasyTransfer is designed to make the development of transfer learning in NLP applications easier. The literature has witnessed the success of applying

Alibaba 819 Jan 03, 2023
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 03, 2023
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021
基于“Seq2Seq+前缀树”的知识图谱问答

KgCLUE-bert4keras 基于“Seq2Seq+前缀树”的知识图谱问答 简介 博客:https://kexue.fm/archives/8802 环境 软件:bert4keras=0.10.8 硬件:目前的结果是用一张Titan RTX(24G)跑出来的。 运行 第一次运行的时候,会给知

苏剑林(Jianlin Su) 65 Dec 12, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

Ruben 1 Jan 27, 2022
ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

Google Research 409 Jan 06, 2023
Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

Md Shamim Hussain 21 Dec 14, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

HUSEIN ZOLKEPLI 151 Jan 05, 2023
Fidibo.com comments Sentiment Analyser

Fidibo.com comments Sentiment Analyser Introduction This project first asynchronously grab Fidibo.com books comment data using grabber.py and then sav

Iman Kermani 3 Apr 15, 2022
Transformers implementation for Fall 2021 Clinic

Installation Download miniconda3 if not already installed You can check by running typing conda in command prompt. Use conda to create an environment

Aakash Tripathi 1 Oct 28, 2021
Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

UlionTse 907 Dec 27, 2022