This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
An extension library for FastAPI framework

FastLab An extension library for FastAPI framework Features Logging Models Utils Routers Installation use pip to install the package: pip install fast

Tezign Lab 10 Jul 11, 2022
The template for building scalable web APIs based on FastAPI, Tortoise ORM and other.

FastAPI and Tortoise ORM. Powerful but simple template for web APIs w/ FastAPI (as web framework) and Tortoise-ORM (for working via database without h

prostomarkeloff 95 Jan 08, 2023
A dynamic FastAPI router that automatically creates CRUD routes for your models

⚡ Create CRUD routes with lighting speed ⚡ A dynamic FastAPI router that automatically creates CRUD routes for your models

Adam Watkins 950 Jan 08, 2023
Easily integrate socket.io with your FastAPI app 🚀

fastapi-socketio Easly integrate socket.io with your FastAPI app. Installation Install this plugin using pip: $ pip install fastapi-socketio Usage To

Srdjan Stankovic 210 Dec 23, 2022
A set of demo of deploying a Machine Learning Model in production using various methods

Machine Learning Model in Production This git is for those who have concern about serving your machine learning model to production. Overview The tuto

Vo Van Tu 53 Sep 14, 2022
✨️🐍 SPARQL endpoint built with RDFLib to serve machine learning models, or any other logic implemented in Python

✨ SPARQL endpoint for RDFLib rdflib-endpoint is a SPARQL endpoint based on a RDFLib Graph to easily serve machine learning models, or any other logic

Vincent Emonet 27 Dec 19, 2022
Generate Class & Decorators for your FastAPI project ✨🚀

Classes and Decorators to use FastAPI with class based routing. In particular this allows you to construct an instance of a class and have methods of that instance be route handlers for FastAPI & Pyt

Yasser Tahiri 34 Oct 27, 2022
Beyonic API Python official client library simplified examples using Flask, Django and Fast API.

Beyonic API Python Examples. The beyonic APIs Doc Reference: https://apidocs.beyonic.com/ To start using the Beyonic API Python API, you need to start

Harun Mbaabu Mwenda 46 Sep 01, 2022
A basic JSON-RPC implementation for your Flask-powered sites

Flask JSON-RPC A basic JSON-RPC implementation for your Flask-powered sites. Some reasons you might want to use: Simple, powerful, flexible and python

Cenobit Technologies 273 Dec 01, 2022
Dead-simple mailer micro-service for static websites

Mailer Dead-simple mailer micro-service for static websites A free and open-source software alternative to contact form services such as FormSpree, to

Romain Clement 42 Dec 21, 2022
Ready-to-use and customizable users management for FastAPI

FastAPI Users Ready-to-use and customizable users management for FastAPI Documentation: https://fastapi-users.github.io/fastapi-users/ Source Code: ht

FastAPI Users 2.3k Dec 30, 2022
Pagination support for flask

flask-paginate Pagination support for flask framework (study from will_paginate). It supports several css frameworks. It requires Python2.6+ as string

Lix Xu 264 Nov 07, 2022
Cube-CRUD is a simple example of a REST API CRUD in a context of rubik's cube review service.

Cube-CRUD is a simple example of a REST API CRUD in a context of rubik's cube review service. It uses Sqlalchemy ORM to manage the connection and database operations.

Sebastian Andrade 1 Dec 11, 2021
Full stack, modern web application generator. Using FastAPI, PostgreSQL as database, Docker, automatic HTTPS and more.

Full Stack FastAPI and PostgreSQL - Base Project Generator Generate a backend and frontend stack using Python, including interactive API documentation

Sebastián Ramírez 10.8k Jan 08, 2023
Restful Api developed with Flask using Prometheus and Grafana for monitoring and containerization with Docker :rocket:

Hephaestus 🚀 In Greek mythology, Hephaestus was either the son of Zeus and Hera or he was Hera's parthenogenous child. ... As a smithing god, Hephaes

Yasser Tahiri 16 Oct 07, 2022
A FastAPI Middleware of joerick/pyinstrument to check your service performance.

fastapi_profiler A FastAPI Middleware of joerick/pyinstrument to check your service performance. 📣 Info A FastAPI Middleware of pyinstrument to check

LeoSun 107 Jan 05, 2023
Prometheus integration for Starlette.

Starlette Prometheus Introduction Prometheus integration for Starlette. Requirements Python 3.6+ Starlette 0.9+ Installation $ pip install starlette-p

José Antonio Perdiguero 229 Dec 21, 2022
FastAPI Project Template

The base to start an openapi project featuring: SQLModel, Typer, FastAPI, JWT Token Auth, Interactive Shell, Management Commands.

A.Freud 4 Dec 05, 2022
Minecraft biome tile server writing on Python using FastAPI

Blocktile Minecraft biome tile server writing on Python using FastAPI Usage https://blocktile.herokuapp.com/overworld/{seed}/{zoom}/{col}/{row}.png s

Vladimir 2 Aug 31, 2022
FastAPI native extension, easy and simple JWT auth

fastapi-jwt FastAPI native extension, easy and simple JWT auth

Konstantin Chernyshev 19 Dec 12, 2022