A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Making the DAEN information accessible.

PipeChain is a utility library for creating functional pipelines.

ped-crash-techvol: Texas Ped Crash Tech Volume Pack

Deep universal probabilistic programming with Python and PyTorch

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

Nobel Data Analysis

CSV database for chihuahua (HUAHUA) blockchain transactions

A 2-dimensional physics engine written in Cairo

Find exposed data in Azure with this public blob scanner

Average time per match by division

Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Minimal working example of data acquisition with nidaqmx python API

A data parser for the internal syncing data format used by Fog of World.

VevestaX is an open source Python package for ML Engineers and Data Scientists.