This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
Opinionated set of utilities on top of FastAPI

FastAPI Contrib Opinionated set of utilities on top of FastAPI Free software: MIT license Documentation: https://fastapi-contrib.readthedocs.io. Featu

identix.one 543 Jan 05, 2023
Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

Jupter Notebook REST API Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as

Invictify 54 Nov 04, 2022
Adds simple SQLAlchemy support to FastAPI

FastAPI-SQLAlchemy FastAPI-SQLAlchemy provides a simple integration between FastAPI and SQLAlchemy in your application. It gives access to useful help

Michael Freeborn 465 Jan 07, 2023
FastAPI pagination

FastAPI Pagination Installation # Basic version pip install fastapi-pagination # All available integrations pip install fastapi-pagination[all] Avail

Yurii Karabas 561 Jan 07, 2023
fastapi-admin2 is an upgraded fastapi-admin, that supports ORM dialects, true Dependency Injection and extendability

FastAPI2 Admin Introduction fastapi-admin2 is an upgraded fastapi-admin, that supports ORM dialects, true Dependency Injection and extendability. Now

Glib 14 Dec 05, 2022
Get MODBUS data from Sofar (K-TLX) inverter through LSW-3 or LSE module

SOFAR Inverter + LSW-3/LSE Small utility to read data from SOFAR K-TLX inverters through the Solarman (LSW-3/LSE) datalogger. Two scripts to get inver

58 Dec 29, 2022
A FastAPI Framework for things like Database, Redis, Logging, JWT Authentication and Rate Limits

A FastAPI Framework for things like Database, Redis, Logging, JWT Authentication and Rate Limits Install You can install this Library with: pip instal

Tert0 33 Nov 28, 2022
Fastapi performans monitoring

Fastapi-performans-monitoring This project is a simple performance monitoring for FastAPI. License This project is licensed under the terms of the MIT

bilal alpaslan 11 Dec 31, 2022
A dynamic FastAPI router that automatically creates CRUD routes for your models

⚡ Create CRUD routes with lighting speed ⚡ A dynamic FastAPI router that automatically creates CRUD routes for your models

Adam Watkins 950 Jan 08, 2023
REST API with FastAPI and JSON file.

FastAPI RESTAPI with a JSON py 3.10 First, to install all dependencies, in ./src/: python -m pip install -r requirements.txt Second, into the ./src/

Luis Quiñones Requelme 1 Dec 15, 2021
:rocket: CLI tool for FastAPI. Generating new FastAPI projects & boilerplates made easy.

Project generator and manager for FastAPI. Source Code: View it on Github Features 🚀 Creates customizable project boilerplate. Creates customizable a

Yagiz Degirmenci 1k Jan 02, 2023
🚀 Cookiecutter Template for FastAPI + React Projects. Using PostgreSQL, SQLAlchemy, and Docker

FastAPI + React · A cookiecutter template for bootstrapping a FastAPI and React project using a modern stack. Features FastAPI (Python 3.8) JWT authen

Gabriel Abud 1.4k Jan 02, 2023
A kedro-plugin to serve Kedro Pipelines as API

General informations Software repository Latest release Total downloads Pypi Code health Branch Tests Coverage Links Documentation Deployment Activity

Yolan Honoré-Rougé 12 Jul 15, 2022
A utility that allows you to use DI in fastapi without Depends()

fastapi-better-di What is this ? fastapi-better-di is a utility that allows you to use DI in fastapi without Depends() Installation pip install fastap

Maxim 9 May 24, 2022
Drop-in MessagePack support for ASGI applications and frameworks

msgpack-asgi msgpack-asgi allows you to add automatic MessagePack content negotiation to ASGI applications (Starlette, FastAPI, Quart, etc.), with a s

Florimond Manca 128 Jan 02, 2023
Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana

cryptocurrency-prices-grafana Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana About This stack consists of: Prometheus (t

Ruan Bekker 7 Aug 01, 2022
Deploy/View images to database sqlite with fastapi

Deploy/View images to database sqlite with fastapi cd realistic Dependencies dat

Fredh Macau 1 Jan 04, 2022
Keepalive - Discord Bot to keep threads from expiring

keepalive Discord Bot to keep threads from expiring Installation Create a new Di

Francesco Pierfederici 5 Mar 14, 2022
LuSyringe is a documentation injection tool for your classes when using Fast API

LuSyringe LuSyringe is a documentation injection tool for your classes when using Fast API Benefits The main benefit is being able to separate your bu

Enzo Ferrari 2 Sep 06, 2021
✨️🐍 SPARQL endpoint built with RDFLib to serve machine learning models, or any other logic implemented in Python

✨ SPARQL endpoint for RDFLib rdflib-endpoint is a SPARQL endpoint based on a RDFLib Graph to easily serve machine learning models, or any other logic

Vincent Emonet 27 Dec 19, 2022