A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

Data-sets from the survey and analysis

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

Convert tables stored as images to an usable .csv file

This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

Repository created with LinkedIn profile analysis project done

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

A multi-platform GUI for bit-based analysis, processing, and visualization

Making the DAEN information accessible.

PyEmits, a python package for easy manipulation in time-series data.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

Functional tensors for probabilistic programming

SparseLasso: Sparse Solutions for the Lasso

Pipetools enables function composition similar to using Unix pipes.

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

Open source platform for Data Science Management automation

Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Integrate bus data from a variety of sources (batch processing and real time processing).