Autoscaling volumes for Kubernetes (with the help of Prometheus)

Overview

Kubernetes Volume Autoscaler (with Prometheus)

This repository contains a service that automatically increases the size of a Persistent Volume Claim in Kubernetes when its nearing full. Initially engineered based on AWS EKS, this should support any Kubernetes cluster or cloud provider which supports dynamically resizing storage volumes in Kubernetes.

Keeping your volumes at a minimal size can help reduce cost, but having to manually scale them up can be painful and a waste of time for an DevOps / Systems Administrator.

Requirements

Prerequisites

As mentioned above, you must have a storageclass which supports volume expansion, and the provisioner you're using must also support volume expansion. Ideally, "hot"-volume expansion so your services never have to restart. AWS EKS built-in provisioner kubernetes.io/aws-ebs supports this, and so does the efs.csi.aws.com CSI driver. To check/enable this...

# First, check if your storage class supports volume expansion...
$ kubectl get storageclasses
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard               kubernetes.io/aws-ebs   Delete          Immediate              false                  10d

# If ALLOWVOLUMEEXPANSION is not set to true, patch it to enable this
kubectl patch storageclass standard -p '{"allowVolumeExpansion": true}'

NOTE: The above storageclass comes with EKS, however, it only supports gp2, which is largely a deprecated and much slower storage driver than gp3. I HIHGLY recommend before using EKS you install the AWS EBS CSI driver to gain gp3 support and more future-proof support of Amazon's various storage volumes and their lifecycles.

If you do this, you can/should completely remove GP2 support, and after installing the above CSI driver, create a storageclass with the new driver with best-practices in it by default including...

  • Retain-ing the volume if it was deleted (to prevent accidental data loss)
  • Having all disks encrypted-at-rest by default, for compliance/security
  • Using gp3 by default for faster disk bandwidth and IO
# For this, simply delete your old default StorageClass
kubectl delete storageclass standard
# Then apply/create a new default gp3 using the AWS EBS CSI driver you installed
kubectl apply -f https://raw.githubusercontent.com/DevOps-Nirvana/Kubernetes-Volume-Autoscaler/master/examples/gp3-default-encrypt-retain-allowExpansion-storageclass.yaml

Installation with Helm

Now that your cluster has a StorageClass which supports expansion, you can install the Volume Autoscaler

# First, setup this repo for your helm
helm repo add devops-nirvana https://devops-nirvana.s3.amazonaws.com/helm-charts/

# Example Install 1 - Using autodiscovery, must be in the same namespace as Prometheus
helm upgrade --install volume-autoscaler devops-nirvana/volume-autoscaler \
  --namespace REPLACEME_WITH_PROMETHEUS_NAMESPACE

# Example 2 - Manually setting where Prometheus is
helm upgrade --install volume-autoscaler devops-nirvana/volume-autoscaler \
  --namespace ANYWHERE_DOESNT_MATTER \
  --set "prometheus_url=http://prometheus-server.namespace.svc.cluster.local"

# Example 3 - Recommended usage, automatically detect Prometheus and use slack notifications
helm upgrade --install volume-autoscaler devops-nirvana/volume-autoscaler \
  --namespace REPLACEME_WITH_PROMETHEUS_NAMESPACE \
  --set "slack_webhook_url=https://hooks.slack.com/services/123123123/4564564564/789789789789789789" \
  --set "slack_channel=my-slack-channel-name"

Advanced helm usage...

# To update your local knowledge of remote repos, you may need to do this before upgrading...
helm repo update

# To view what changes it will make, if you change things, this requires the helm diff plugin - https://github.com/databus23/helm-diff
helm diff upgrade volume-autoscaler --allow-unreleased devops-nirvana/volume-autoscaler \
  --namespace infrastructure \
  --set "slack_webhook_url=https://hooks.slack.com/services/123123123/4564564564/789789789789789789" \
  --set "slack_channel=my-slack-channel-name" \
  --set "prometheus_url=http://prometheus-server.infrastructure.svc.cluster.local"

# To remove the service, simply run...
helm uninstall volume-autoscaler

(Alternate) Installation with kubectl

./to_be_applied.yaml # #3: If you wish to have slack notifications, edit this to_be_applied.yaml and embed your webhook on the value: line for SLACK_WEBHOOK and set the SLACK_CHANNEL as well accordingly # #4: Finally, apply it... kubectl --namespace REPLACEME_WITH_PROMETHEUS_NAMESPACE apply ./to_be_applied.yaml">
# This simple installation will work as long as you put this in the same namespace as Prometheus
# The default namespace of this yaml is hardcoded to is `infrastructure`.  If you'd like to change
# the namespace you can run the first few commands below...

# IF YOU USE `infrastructure` AS THE NAMESPACE FOR PROMETHEUS SIMPLY...
kubectl --namespace infrastructure apply https://devops-nirvana.s3.amazonaws.com/volume-autoscaler/volume-autoscaler-1.0.1.yaml

# OR, IF YOU NEED TO CHANGE THE NAMESPACE...
# #1: Download the yaml...
wget https://devops-nirvana.s3.amazonaws.com/volume-autoscaler/volume-autoscaler-1.0.1.yaml
# #1: Or download with curl
curl https://devops-nirvana.s3.amazonaws.com/volume-autoscaler/volume-autoscaler-1.0.1.yaml -o volume-autoscaler-1.0.1.yaml
# #2: Then replace the namespace in this, replacing
cat volume-autoscaler-1.0.1.yaml | sed 's/"infrastructure"/"PROMETHEUS_NAMESPACE_HERE"/g' > ./to_be_applied.yaml
# #3: If you wish to have slack notifications, edit this to_be_applied.yaml and embed your webhook on the value: line for SLACK_WEBHOOK and set the SLACK_CHANNEL as well accordingly
# #4: Finally, apply it...
kubectl --namespace REPLACEME_WITH_PROMETHEUS_NAMESPACE apply ./to_be_applied.yaml

Validation

To confirm the volume autoscaler is working properly this repo has an example which you can apply to your Kubernetes cluster which is an PVC and a pod which uses that PVC and fills the disk up constantly. To do this...

# Simply run this on your terminal
kubectl apply -f https://raw.githubusercontent.com/DevOps-Nirvana/Kubernetes-Volume-Autoscaler/master/examples/simple-pod-with-pvc.yaml

Then if you'd like to follow-along, "follow" the logs of your volume autoscaler to watch it detect full disk and scale up.

Per-Volume Configuration / Annotations

This controller also supports tweaking your volume-autoscaler configuration per-PVC with annotations. The annotations supported are...

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sample-volume-claim
  annotations:
    # This is when we want to scale up after the disk is this percentage (out of 100) full
    volume.autoscaler.kubernetes.io/scale-above-percent: "80"   # 80 is the default value
    # This is how many intervals must go by above the scale-above-percent before triggering an autoscale action
    volume.autoscaler.kubernetes.io/scale-after-intervals: "5"  # 5 is this default value
    # This is how much to scale a disk up by, in percentage of the current size.
    #   Eg: If this is set to "10" and the disk is 100GB, it will scale to 110GB
    #   At larger disk sizes you may want to set this on your PVCs to like "5" or "10"
    volume.autoscaler.kubernetes.io/scale-up-percent: "50"      # 50 (percent) is the default value
    # This is the smallest increment to scale up by.  This helps when the disks are very small, and helps hit the minimum increment value per-provider (this is 1GB on AWS)
    volume.autoscaler.kubernetes.io/scale-up-min-increment: "1000000000"  # 1GB by default (in bytes)
    # This is the largest disk size ever allowed for this tool to scale up to.  This is set to 16TB by default, because that's the limit of AWS EBS
    volume.autoscaler.kubernetes.io/scale-up-max-size: "16000000000000"  # 16TB by default (in bytes)
    # How long (in seconds) we must wait before scaling this volume again.  For AWS EBS, this is 6 hours which is 21600 seconds but for good measure we add an extra 10 minutes to this, so 22200
    volume.autoscaler.kubernetes.io/scale-cooldown-time: "22200"  
    # If you want the autoscaler to completely ignore/skip this PVC, set this to "true"
    volume.autoscaler.kubernetes.io/ignore: "false"  
    # Finally, Do not set this, and if you see this ignore this, this is how Volume Autoscaler keeps its "state"
    volume.autoscaler.kubernetes.io/last-resized-at: "123123123"  # This will be an Unix epoch timestamp
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: standard

TODO

This todo list is mostly for the Author(s), but any contributions are also welcome. Please submit an Issue for issues or requests, or an Pull Request if you added some code.

  • Make helm chart able to customize the prometheus label selector
  • Add scale up max increment
  • Make log have more full (simplified) data about disks (max size, usage, etc, for debugging purposes)
  • Add dry-run as top-level arg to easily adjust, add to examples on this README
  • Push to helm repo in a Github Action and push the static yaml as well
  • Add tests coverage to ensure the software works as intended moving forward
  • Do some load testing to see how well this software deals with scale (100+ PVs, 500+ PVs, etc)
  • Figure out what type of Memory/CPU is necessary for 500+ PVs, see above
  • Add verbosity levels for print statements, to be able to quiet things down in the logs
  • Generate kubernetes EVENTS (add to rbac) so everyone knows we are doing things, to be a good controller
  • Add badges to the README
  • Listen/watch to events of the PV/PVC to monitor and ensure the resizing happens, log and/or slack it accordingly
  • Test it and add working examples of using this on other cloud providers (Azure / Google Cloud)
  • Make per-PVC annotations to (re)direct Slack to different webhooks and/or different channel(s)
  • Discuss what the ideal "default" amount of time before scaling. Currently is 5 minutes (5, 60 minute intervals)
You might also like...
A Simple script to hunt unused Kubernetes resources.

K8SPurger A Simple script to hunt unused Kubernetes resources. Release History Release 0.3 Added Ingress Added Services Account Adding RoleBindding Re

Run Oracle on Kubernetes with El Carro

El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a powerful declarative API for comprehensive and consistent configuration and deployment as well as for real-time operations and monitoring.

Chartreuse: Automated Alembic migrations within kubernetes
Chartreuse: Automated Alembic migrations within kubernetes

Chartreuse: Automated Alembic SQL schema migrations within kubernetes "How to automate management of Alembic database schema migration at scale using

sysctl/sysfs settings on a fly for Kubernetes Cluster. No restarts are required for clusters and nodes.

SysBindings Daemon Little toolkit for control the sysctl/sysfs bindings on Kubernetes Cluster on the fly and without unnecessary restarts of cluster o

Caboto, the Kubernetes semantic analysis tool
Caboto, the Kubernetes semantic analysis tool

Caboto Caboto, the Kubernetes semantic analysis toolkit. It contains a lightweight Python library for semantic analysis of plain Kubernetes manifests

Hubble - Network, Service & Security Observability for Kubernetes using eBPF
Hubble - Network, Service & Security Observability for Kubernetes using eBPF

Network, Service & Security Observability for Kubernetes What is Hubble? Getting Started Features Service Dependency Graph Metrics & Monitoring Flow V

Rancher Kubernetes API compatible with RKE, RKE2 and maybe others?

kctl Rancher Kubernetes API compatible with RKE, RKE2 and maybe others? Documentation is WIP. Quickstart pip install --upgrade kctl Usage from lazycls

A charmed operator for running PGbouncer on kubernetes.

operator-template Description TODO: Describe your charm in a few paragraphs of Markdown Usage TODO: Provide high-level usage, such as required config

Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

Comments
  • Autoscaling size below current size and PVC size not human readable.

    Autoscaling size below current size and PVC size not human readable.

    Sometimes, the autoscaler tries to resize a PVC with a size below current size, raising an error.

    Volume infra.data-nfs-server-provisioner-1637948923-0 is 85% in-use of the 80Gi available
      BECAUSE it is above 80% used
      ALERT has been for 1306 period(s) which needs to at least 5 period(s) to scale
      AND we need to scale it immediately, it has never been scaled previously
      RESIZING disk from 86G to 20G
      Exception raised while trying to scale up PVC infra.data-nfs-server-provisioner-1637948923-0 to 20000000000 ...
    (422)
    Reason: Unprocessable Entity
    HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e69b53c3-d332-4925-b9ea-afa7570297a9', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b64e47c9-2a4e-48ae-83bc-355685b6c007', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e5841496-62d0-426a-a987-4b26ec143a20', 'Date': 'Sat, 22 Oct 2022 16:58:07 GMT', 'Content-Length': '520'})
    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PersistentVolumeClaim \"data-nfs-server-provisioner-1637948923-0\" is invalid: spec.resources.requests.storage: Forbidden: field can not be less than previous value","reason":"Invalid","details":{"name":"data-nfs-server-provisioner-1637948923-0","kind":"PersistentVolumeClaim","causes":[{"reason":"FieldValueForbidden","message":"Forbidden: field can not be less than previous value","field":"spec.resources.requests.storage"}]},"code":422}
    
    
    FAILED requesting to scale up `infra.data-nfs-server-provisioner-1637948923-0` by `10%` from `86G` to `20G`, it was using more than `80%` disk space over the last `78360 seconds`
    

    I'm using the helm chart version 1.0.3 (same image tag)

    Another issue, the autoscaler was able to resize another PVC from 13Gi to 14173392076, this is not human readable as before. It's not a serious issue but it's still disturbing. The autoscaler also sent the alert to slack twice for this PVC with several hours interval.

    opened by GuillaumeOuint 9
  • Customer-reported issue: Is not detecting updated/resized max size

    Customer-reported issue: Is not detecting updated/resized max size

    There appears to be a bug in Prometheus Server which causes the kubelet_volume_stats_capacity_bytes to not be updated properly in Prometheus after a resize. Note: May need to go file a bug against the metrics-server or Prometheus.

    After further investigation, it appears the prometheus metrics of kube_persistentvolume_capacity_bytes which is tied to the "PV" and not the "PVC" is fully updated, and we could (in theory) instead look there for the updated value but I believe this to be a bug which should be fixed in Prometheus.

    Screen Shot 2022-03-07 at 10 00 37 AM
    opened by AndrewFarley 3
  • Handling low max edge-case better, human-readable debug output

    Handling low max edge-case better, human-readable debug output

    Features

    • Updating various debug output to be human-readable, since bytes is just really, really long with a lot of zeroes, not ideal or reasonably human parsable
    • Catching an edge case where a user puts a max disk size of too small, and a disk can't scale up any more

    Closes #3 ( thanks for finding & reporting this @GuillaumeOuint )

    opened by AndrewFarley 0
Releases(1.0.5)
Owner
DevOps Nirvana
What happens when you set everything up perfectly? Nirvana happens
DevOps Nirvana
A Habitica Integration with Github Workflows.

Habitica-Workflow A Habitica Integration with Github Workflows. How To Use? Fork (and Star) this repository. Set environment variable in Settings - S

Priate 2 Dec 20, 2021
Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

Jukka Väisänen 2 Feb 11, 2022
Cobbler is a versatile Linux deployment server

Cobbler Cobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many

Cobbler 2.4k Dec 24, 2022
Emissary - open source Kubernetes-native API gateway for microservices built on the Envoy Proxy

Emissary-ingress Emissary-Ingress is an open-source Kubernetes-native API Gateway + Layer 7 load balancer + Kubernetes Ingress built on Envoy Proxy. E

Emissary Ingress 4k Dec 31, 2022
Asynchronous parallel SSH client library.

parallel-ssh Asynchronous parallel SSH client library. Run SSH commands over many - hundreds/hundreds of thousands - number of servers asynchronously

1.1k Dec 31, 2022
Tencent Yun tools with python

Tencent_Yun_tools 使用 python3.9 + 腾讯云 AccessKey 利用工具 使用之前请先填写config.ini配置文件 Usage python3 Tencent_rce.py -h Scanner python3 Tencent_rce.py -s 生成CSV

<img src="> 13 Dec 20, 2022
This projects provides the documentation and the automation(code) for the Oracle EMEA WLA COA Demo UseCase.

COA DevOps Training UseCase This projects provides the documentation and the automation(code) for the Oracle EMEA WLA COA Demo UseCase. Demo environme

Cosmin Tudor 1 Jan 28, 2022
Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.

🚀 Welcome to AWS Copilot Workshop In this workshop, you'll learn how to build, release and operate your containerised applications to Amazon ECS and

Donnie Prakoso 15 Jul 14, 2022
Pulumi - Developer-First Infrastructure as Code. Your Cloud, Your Language, Your Way 🚀

Pulumi's Infrastructure as Code SDK is the easiest way to create and deploy cloud software that use containers, serverless functions, hosted services,

Pulumi 14.7k Jan 08, 2023
Chartreuse: Automated Alembic migrations within kubernetes

Chartreuse: Automated Alembic SQL schema migrations within kubernetes "How to automate management of Alembic database schema migration at scale using

Wiremind 8 Oct 25, 2022
Copy a Kubernetes pod and run commands in its environment

copypod Utility for copying a running Kubernetes pod so you can run commands in a copy of its environment, without worrying about it the pod potential

Memrise 4 Apr 08, 2022
A charmed operator for running PGbouncer on kubernetes.

operator-template Description TODO: Describe your charm in a few paragraphs of Markdown Usage TODO: Provide high-level usage, such as required config

Canonical 1 Dec 01, 2022
Repository tracking all OpenStack repositories as submodules. Mirror of code maintained at opendev.org.

OpenStack OpenStack is a collection of interoperable components that can be deployed to provide computing, networking and storage resources. Those inf

Mirrors of opendev.org/openstack 4.6k Dec 28, 2022
Changelog CI is a GitHub Action that enables a project to automatically generate changelogs

What is Changelog CI? Changelog CI is a GitHub Action that enables a project to automatically generate changelogs. Changelog CI can be triggered on pu

Maksudul Haque 106 Dec 25, 2022
Push Container Image To Docker Registry In Python

push-container-image-to-docker-registry 概要 push-container-image-to-docker-registry は、エッジコンピューティング環境において、特定のエッジ端末上の Private Docker Registry に特定のコンテナイメー

Latona, Inc. 3 Nov 04, 2021
Deploying a production-ready Django project using Nginx and Gunicorn

django-nginx-gunicorn This project is for deploying a production-ready Django project using Nginx and Gunicorn. Running a local server of Django is no

Arash Sayareh 8 Jul 03, 2022
strava-offline is a tool to keep a local mirror of Strava activities for further analysis/processing:

strava-offline Overview strava-offline is a tool to keep a local mirror of Strava activities for further analysis/processing: synchronizes metadata ab

Tomáš Janoušek 29 Dec 14, 2022
MLops tools review for execution on multiple cluster types: slurm, kubernetes, dask...

MLops tools review focused on execution using multiple cluster types: slurm, kubernetes, dask...

4 Nov 30, 2022
framework providing automatic constructions of vulnerable infrastructures

中文 | English 1 Introduction Metarget = meta- + target, a framework providing automatic constructions of vulnerable infrastructures, used to deploy sim

rambolized 685 Dec 28, 2022
CDK Template of Table Definition AWS Lambda for RDB

CDK Template of Table Definition AWS Lambda for RDB Overview This sample deploys Amazon Aurora of PostgreSQL or MySQL with AWS Lambda that can define

AWS Samples 5 May 16, 2022