Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Last update: Nov 15, 2022

Related tags

Documentation time-series-kafka-demo

Overview

time-series-kafka-demo

Mock stream producer for time series data using Kafka.

I walk through this tutorial and others here on GitHub and on my Medium blog. Here is a friend link for open access to the article on Towards Data Science: Make a mock “real-time” data stream with Python and Kafka. I'll always add friend links on my GitHub tutorials for free Medium access if you don't have a paid Medium membership (referral link).

If you find any of this useful, I always appreciate contributions to my Saturday morning fancy coffee fund!

This repo demos how to convert a csv file of timestamped data into a real-time stream useful for testing streaming analytics. An example input file with random time series data and a script for generating the file are included in the data directory.

The producer and consumer Python scripts use Confluent's Kafka client for Python, which is installed in the Docker image built with the accompanying Dockerfile, if you choose to use it.

Requires Docker and Docker Compose.

Usage

Clone repo and cd into directory.

git clone https://github.com/mtpatter/time-series-kafka-demo.git
cd time-series-kafka-demo

Start the Kafka broker

docker compose up --build

Build a Docker image (optionally, for the producer and consumer)

From the main root directory:

docker build -t "kafkacsv" .

If you want to use Docker for the python scripts, this should now work:

docker run -it --rm kafkacsv python bin/sendStream.py -h

Start a consumer

To start a consumer for printing all messages in real-time from the stream "my-stream":

python bin/processStream.py my-stream

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/processStream.py my-stream

Produce a time series stream

Send time series from data/data.csv to topic “my-stream”, and speed it up by a factor of 10.

python bin/sendStream.py data/data.csv my-stream --speed 10

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/sendStream.py data/data.csv my-stream --speed 10

Shut down and clean up

Stop the consumer with Return and Ctrl+C.

Shutdown Kafka broker system:

docker compose down

Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Related tags

Overview

time-series-kafka-demo

Usage

Owner

Maria Patterson

Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

python package sphinx template

A collection and example code of every topic you need to know about in the basics of Python.

Hasköy is an open-source variable sans-serif typeface family

A web app builds using streamlit API with python backend to analyze and pick insides from multiple data formats.

Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.

VSCode extension that generates docstrings for python files

Python Tool to Easily Generate Multiple Documents

PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation for Power Apps.

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Pyoccur - Python package to operate on occurrences (duplicates) of elements in lists

Fast, efficient Blowfish cipher implementation in pure Python (3.4+).

Sane and flexible OpenAPI 3 schema generation for Django REST framework.

💻An open-source eBook with 101 Linux commands that everyone should know

Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

📖 Generate markdown API documentation from Google-style Python docstring. The lazy alternative to Sphinx.

Repository for tutorials, examples and starter scripts for using the MTU HPC cluster

Python code for working with NFL play by play data.

Sphinx Bootstrap Theme

SamrSearch - SamrSearch can get user info and group info with MS-SAMR