Visual disk-usage analyser for docker images

Overview

whaler

PyPI versions PyPI versions

What? A command-line tool for visually investigating the disk usage of docker images

Why? Large images are slow to move and expensive to store. They cost developer productivity by lengthening devops tasks and often contain unnecessary data

Who is this for? Primarily for engineers working with images containing Python packages.

User Stories

This tool should allow you to answer questions such as:

  1. Which file types are occupying the most disk space?
  2. Which are my largest Python packages?
  3. What are my unknown causes of high disk usage?

Quick start

pip install whaler

Run against a local directory

➜ whaler .venv
Running bash -c cd .venv && du -a -k
Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

Run against a docker image

The tool will pull the image first if it is not present.

whaler --image='hl:latest' /
Running docker run --rm --entrypoint=du --workdir=/ hl:latest -a -k
Ignoring what seems to be non-fatal error(s):
du: cannot access './proc/1/task/1/fd/4': No such file or directory
du: cannot access './proc/1/task/1/fdinfo/4': No such file or directory
du: cannot access './proc/1/fd/3': No such file or directory
du: cannot access './proc/1/fdinfo/3': No such file or directory


Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

HTML Report

Play with a hosted demo

Limitations

  1. Platform: whaler uses du to gather disk usage data. It must be present in your docker image
  2. Scale: I have tested the web UI with up to 500,000 file system nodes with du output of up to ~100MB.

Alternatives/Complements to this tool:

  1. Whaler can tell you what is taking up space in the final layer of your Docker image, but you may have intermediate layers which are contributing to the image size. For diving through the layers, use dive
    • Related: read up on multi-stage builds to understand how to mitigate the problem of intermediate layers bloating your image.
  2. For investigating disk usage in non-docker directories, Disk Inventory X is a great tool on OS X which I have based whaler on.
  3. GoogleContainerTools/distroless for base images not containing standard linux distro contents

See also

  1. a good blog post on making small python images

Developing

See .github/workflows/test.yml for the development platform and setup.

For UI, see whaler-ui

You might also like...
Lima is an alternative to using Docker Desktop on your Mac.
Lima is an alternative to using Docker Desktop on your Mac.

lima-xbar-plugin Table of Contents Description Installation Dependencies Lima is an alternative to using Docker Desktop on your Mac. Description This

Build Netbox as a Docker container

netbox-docker The Github repository houses the components needed to build Netbox as a Docker container. Images are built using this code and are relea

Some automation scripts to setup a deployable development database server (with docker).

Postgres-Docker Database Initializer This is a simple automation script that will create a Docker Postgres database with a custom username, password,

A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

A repository containing a short tutorial for Docker (with Python).
A repository containing a short tutorial for Docker (with Python).

Docker Tutorial for IFT 6758 Lab In this repository, we examine the advtanges of virtualization, what Docker is and how we can deploy simple programs

Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Hatch plugin for Docker containers

hatch-containers CI/CD Package Meta This provides a plugin for Hatch that allows

Create pinned requirements.txt inside a Docker image using pip-tools
Create pinned requirements.txt inside a Docker image using pip-tools

Pin your Python dependencies! pin-requirements.py is a script that lets you pin your Python dependencies inside a Docker container. Pinning your depen

This repository contains useful docker-swarm-tools.

docker-swarm-tools This repository contains useful docker-swarm-tools. swarm-guardian This Docker image is intended to be used in a multihost docker e

Comments
  • Bounding box on selected node barely visible

    Bounding box on selected node barely visible

    This is due to the canvas not being fully re-drawn upon selection.

    The solution is probably to ensure the canvas can be re-drawn quickly and creating a fresh render every time

    bug 
    opened by alex-treebeard 0
Releases(0.1.2)
Owner
Treebeard Technologies
Helping with Data Infra
Treebeard Technologies
Let's Git - Version Control & Open Source Homework

Let's Git - Version Control & Open Source Homework Welcome to this homework for our MOOC: Let's Git! We hope you will learn a lot and have fun working

1 Dec 05, 2021
MLops tools review for execution on multiple cluster types: slurm, kubernetes, dask...

MLops tools review focused on execution using multiple cluster types: slurm, kubernetes, dask...

4 Nov 30, 2022
DataOps framework for Machine Learning projects.

Noronha DataOps Noronha is a Python framework designed to help you orchestrate and manage ML projects life-cycle. It hosts Machine Learning models ins

52 Oct 30, 2022
Some automation scripts to setup a deployable development database server (with docker).

Postgres-Docker Database Initializer This is a simple automation script that will create a Docker Postgres database with a custom username, password,

Pysogge 1 Nov 11, 2021
Define and run multi-container applications with Docker

Docker Compose Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file is us

Docker 28.2k Jan 08, 2023
Cobbler is a versatile Linux deployment server

Cobbler Cobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many

Cobbler 2.4k Dec 24, 2022
Automate SSH in python easily!

RedExpect RedExpect makes automating remote machines over SSH very easy to do and is very fast in doing exactly what you ask of it. Based on ssh2-pyth

Red_M 19 Dec 17, 2022
CDK Template of Table Definition AWS Lambda for RDB

CDK Template of Table Definition AWS Lambda for RDB Overview This sample deploys Amazon Aurora of PostgreSQL or MySQL with AWS Lambda that can define

AWS Samples 5 May 16, 2022
Run Oracle on Kubernetes with El Carro

El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a p

Google Cloud Platform 205 Dec 30, 2022
The leading native Python SSHv2 protocol library.

Paramiko Paramiko: Python SSH module Copyright: Copyright (c) 2009 Robey Pointer 8.1k Jan 04, 2023

A Blazing fast Security Auditing tool for Kubernetes

A Blazing fast Security Auditing tool for kubernetes!! Basic Overview Kubestriker performs numerous in depth checks on kubernetes infra to identify th

Vasant Chinnipilli 934 Jan 04, 2023
Kubediff: a tool for Kubernetes to show differences between running state and version controlled configuration.

Kubediff: a tool for Kubernetes to show differences between running state and version controlled configuration.

Weaveworks 1.1k Dec 30, 2022
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Arie Bregman 35.1k Jan 02, 2023
A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

Justin Fu 55 Aug 27, 2022
Deploying a production-ready Django project using Nginx and Gunicorn

django-nginx-gunicorn This project is for deploying a production-ready Django project using Nginx and Gunicorn. Running a local server of Django is no

Arash Sayareh 8 Jul 03, 2022
Micro Data Lake based on Docker Compose

Micro Data Lake based on Docker Compose This is the implementation of a Minimum Data Lake

Abel Coronado 15 Jan 07, 2023
Supervisor process control system for UNIX

Supervisor Supervisor is a client/server system that allows its users to control a number of processes on UNIX-like operating systems. Supported Platf

Supervisor 7.6k Dec 31, 2022
A honey token manager and alert system for AWS.

SpaceSiren SpaceSiren is a honey token manager and alert system for AWS. With this fully serverless application, you can create and manage honey token

287 Nov 09, 2022
Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty

Cabot Maintainers wanted Cabot is stable and used by hundreds of companies and individuals in production, but it is not actively maintained. We would

Arachnys 5.4k Dec 23, 2022
RMRK spy bot for RMRK hackathon

rmrk_spy_bot RMRK spy bot https://t.me/RMRKspyBot for rmrk hacktoberfest https://rmrk.devpost.com/ Birds and items price and rarity estimation Reports

Victor Ryabinin 2 Sep 06, 2022