Deploy a simple Multi-Node Clickhouse Cluster with docker-compose in minutes.

Overview

Simple Multi Node Clickhouse Cluster

I hate those single-node clickhouse clusters and manually installation, I mean, why should we:

this is just weird!

So this repo tries to solve these problem.

Note

  • This is a simplified model of Multi Node Clickhouse Cluster, which lacks: LoadBalancer config/Automated Failover/MultiShard Config generation.
  • All clickhouse data is persisted under event-data, if you need to move clickhouse to some other directory, you'll just need to move the directory(that contains docker-compose.yml) and docker-compose up -d to fire it up again.
  • Host network mode is used to simplify the whole deploy procedure, so you might need to create addition firewall rules if you are running this on a public accessible machine.

Prerequisites

To use this, we need docker and docker-compose installed, recommended OS is ubuntu, and it's recommended to install clickhouse-client on machine, so on a typical ubuntu server, doing the following should be sufficient.

apt update
curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh && rm -f get-docker.sh
apt install docker-compose clickhouse-client -y

Usage

  1. Clone this repo
  2. Edit the necessary server info in topo.yml
  3. Run python3 generate.py
  4. Your cluster info should be in the cluster directory now
  5. Sync those files to related nodes and run docker-compose up -d on them
  6. Your cluster is ready to go

If you still cannot understand what I'm saying above, see the example below.

Example Usage

Edit information

I've Clone the repo and would like to set a 3-master clickhouse cluster and has the following specs

  • 3 replica(one replica on each node)
  • 1 Shard only

So I need to edit the topo.yml as follows:

global:
  clickhouse_image: "yandex/clickhouse-server:21.3.2.5"
  zookeeper_image: "bitnami/zookeeper:3.6.1"

zookeeper_servers:
  - host: 192.168.33.101
  - host: 192.168.33.102
  - host: 192.168.33.103

clickhouse_servers:
  - host: 192.168.33.101
  - host: 192.168.33.102
  - host: 192.168.33.103

clickhouse_topology:
  - clusters:
      - name: "novakwok_cluster"
        shards:
          - name: "novakwok_shard"
            servers:
              - host: 192.168.33.101
              - host: 192.168.33.102
              - host: 192.168.33.103

Generate Config

After python3 generate.py, a structure has been generated under cluster directory, looks like this:

➜  simple-multinode-clickhouse-cluster git:(master) ✗ python3 generate.py 
Write clickhouse-config.xml to cluster/192.168.33.101/clickhouse-config.xml
Write clickhouse-config.xml to cluster/192.168.33.102/clickhouse-config.xml
Write clickhouse-config.xml to cluster/192.168.33.103/clickhouse-config.xml

➜  simple-multinode-clickhouse-cluster git:(master) ✗ tree cluster 
cluster
├── 192.168.33.101
│   ├── clickhouse-config.xml
│   ├── clickhouse-user-config.xml
│   └── docker-compose.yml
├── 192.168.33.102
│   ├── clickhouse-config.xml
│   ├── clickhouse-user-config.xml
│   └── docker-compose.yml
└── 192.168.33.103
    ├── clickhouse-config.xml
    ├── clickhouse-user-config.xml
    └── docker-compose.yml

3 directories, 9 files

Sync Config

Now we need to sync those files to related hosts(of course you can use ansible here):

rsync -aP ./cluster/192.168.33.101/ [email protected]:/root/ch/
rsync -aP ./cluster/192.168.33.102/ [email protected]:/root/ch/
rsync -aP ./cluster/192.168.33.103/ [email protected]:/root/ch/

Start Cluster

Now run docker-compose up -d on every hosts' /root/ch/ directory.

Validation

On 192.168.33.101, use clickhouse-client to connect to local instance and check if cluster is there.

[email protected]:~/ch# clickhouse-client 
ClickHouse client version 18.16.1.
Connecting to localhost:9000.
Connected to ClickHouse server version 21.3.2 revision 54447.

192-168-33-101 :) SELECT * FROM system.clusters;

SELECT *
FROM system.clusters 

┌─cluster──────────────────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ novakwok_cluster                             │         1 │            1 │           1 │ 192.168.33.101 │ 192.168.33.101 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ novakwok_cluster                             │         1 │            1 │           2 │ 192.168.33.102 │ 192.168.33.102 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ novakwok_cluster                             │         1 │            1 │           3 │ 192.168.33.103 │ 192.168.33.103 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards                      │         1 │            1 │           1 │ 127.0.0.1      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards                      │         2 │            1 │           1 │ 127.0.0.2      │ 127.0.0.2      │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_internal_replication │         1 │            1 │           1 │ 127.0.0.1      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_internal_replication │         2 │            1 │           1 │ 127.0.0.2      │ 127.0.0.2      │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_localhost            │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_localhost            │         2 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_shard_localhost                         │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_shard_localhost_secure                  │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9440 │        0 │ default │                  │            0 │                       0 │
│ test_unavailable_shard                       │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_unavailable_shard                       │         2 │            1 │           1 │ localhost      │ 127.0.0.1      │    1 │        0 │ default │                  │            0 │                       0 │
└──────────────────────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
↘ Progress: 13.00 rows, 1.58 KB (4.39 thousand rows/s., 532.47 KB/s.) 
13 rows in set. Elapsed: 0.003 sec. 

Let's create a DB with replica:

192-168-33-101 :) create database novakwok_test on cluster novakwok_cluster;

CREATE DATABASE novakwok_test ON CLUSTER novakwok_cluster

┌─host───────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ 192.168.33.103 │ 9000 │      0 │       │                   2 │                0 │
│ 192.168.33.101 │ 9000 │      0 │       │                   1 │                0 │
│ 192.168.33.102 │ 9000 │      0 │       │                   0 │                0 │
└────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
← Progress: 3.00 rows, 174.00 B (16.07 rows/s., 931.99 B/s.)  99%
3 rows in set. Elapsed: 0.187 sec. 

192-168-33-101 :) show databases;

SHOW DATABASES

┌─name──────────┐
│ default       │
│ novakwok_test │
│ system        │
└───────────────┘
↑ Progress: 3.00 rows, 479.00 B (855.61 rows/s., 136.61 KB/s.) 
3 rows in set. Elapsed: 0.004 sec. 

Connect to another host to see if it's really working.

[email protected]:~/ch# clickhouse-client -h 192.168.33.102
ClickHouse client version 18.16.1.
Connecting to 192.168.33.102:9000.
Connected to ClickHouse server version 21.3.2 revision 54447.

192-168-33-102 :) show databases;

SHOW DATABASES

┌─name──────────┐
│ default       │
│ novakwok_test │
│ system        │
└───────────────┘
↘ Progress: 3.00 rows, 479.00 B (623.17 rows/s., 99.50 KB/s.) 
3 rows in set. Elapsed: 0.005 sec. 

License

GPL

Owner
Nova Kwok
43EC 6073 0BFF A16C 34BB 9EF2 8D42 A0E6 99E5 0639
Nova Kwok
IP address management (IPAM) and data center infrastructure management (DCIM) tool.

NetBox is an IP address management (IPAM) and data center infrastructure management (DCIM) tool. Initially conceived by the network engineering team a

NetBox Community 11.8k Jan 07, 2023
Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Bitnami 1 Jan 13, 2022
A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

Justin Fu 55 Aug 27, 2022
SSH to WebSockets Bridge

wssh wssh is a SSH to WebSockets Bridge that lets you invoke a remote shell using nothing but HTTP. The client connecting to wssh doesn't need to spea

Andrea Luzzardi 1.3k Dec 25, 2022
Knock your images before these make you painful.

image-knocker Knock your images before these make you painful. Background One day, I had run my deep learning model training program and got off work

Yonghye Kwon 9 Jul 25, 2022
a CLI that provides a generic automation layer for assessing the security of ML models

Counterfit About | Getting Started | Learn More | Acknowledgments | Contributing | Trademarks | Contact Us -------------------------------------------

Microsoft Azure 575 Jan 02, 2023
CDK Template of Table Definition AWS Lambda for RDB

CDK Template of Table Definition AWS Lambda for RDB Overview This sample deploys Amazon Aurora of PostgreSQL or MySQL with AWS Lambda that can define

AWS Samples 5 May 16, 2022
Automate SSH in python easily!

RedExpect RedExpect makes automating remote machines over SSH very easy to do and is very fast in doing exactly what you ask of it. Based on ssh2-pyth

Red_M 19 Dec 17, 2022
Wiremind Kubernetes helper

Wiremind Kubernetes helper This Python library is a high-level set of Kubernetes Helpers allowing either to manage individual standard Kubernetes cont

Wiremind 3 Oct 09, 2021
Wubes is like Qubes but for Windows.

Qubes containerization on Windows. The idea is to leverage the Windows Sandbox technology to spawn applications in isolation.

NCC Group Plc 124 Dec 16, 2022
Play Wordle from any Kubernetes cluster.

wordle-operator 🟩 ⬛ 🟩 🟨 ⬛ Play Wordle from any Kubernetes cluster. Using the power of CustomResourceDefinitions and Kubernetes Operators, now you c

Lucas Melin 1 Jan 15, 2022
A basic instruction for Kubernetes setup and understanding.

A basic instruction for Kubernetes setup and understanding Module ID Module Guide - Install Kubernetes Cluster k8s-install 3 Docker Core Technology mo

648 Jan 02, 2023
A lobby boy will create a VPS server when you need one, and destroy it after using it.

Lobbyboy What is a lobby boy? A lobby boy is completely invisible, yet always in sight. A lobby boy remembers what people hate. A lobby boy anticipate

226 Dec 29, 2022
A repository containing a short tutorial for Docker (with Python).

Docker Tutorial for IFT 6758 Lab In this repository, we examine the advtanges of virtualization, what Docker is and how we can deploy simple programs

Arka Mukherjee 0 Dec 14, 2021
Remote Desktop Protocol in Twisted Python

RDPY Remote Desktop Protocol in twisted python. RDPY is a pure Python implementation of the Microsoft RDP (Remote Desktop Protocol) protocol (client a

Sylvain Peyrefitte 1.6k Dec 30, 2022
Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App

Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App This example provisions a Google Kubernetes Engine

Pas Apicella 2 Feb 09, 2022
Hubble - Network, Service & Security Observability for Kubernetes using eBPF

Network, Service & Security Observability for Kubernetes What is Hubble? Getting Started Features Service Dependency Graph Metrics & Monitoring Flow V

Cilium 2.4k Jan 04, 2023
DAMPP (gui) is a Python based program to run simple webservers using MySQL, Php, Apache and PhpMyAdmin inside of Docker containers.

DAMPP (gui) is a Python based program to run simple webservers using MySQL, Php, Apache and PhpMyAdmin inside of Docker containers.

Sehan Weerasekara 1 Feb 19, 2022
Project 4 Cloud DevOps Nanodegree

Project Overview In this project, you will apply the skills you have acquired in this course to operationalize a Machine Learning Microservice API. Yo

1 Nov 21, 2021
Organizing ssh servers in one shell.

NeZha (哪吒) NeZha is a famous chinese deity who can have three heads and six arms if he wants. And my NeZha tool is hoping to bring developer such mult

Zilin Zhu 8 Dec 20, 2021