Integrate bus data from a variety of sources (batch processing and real time processing).

Last update: Nov 25, 2021

Related tags

Data Analysis bus_data_ingestion_pipeline

Overview

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and real time processing)

Technique:

Python
Application: Kafka, MQTT Explorer, Grafana, Influxdb, MS VS Studio 2019, MS SQL Server, PowerBI Desktop
Framework: kafka-python, numpy, paho-mqtt, pandas, pyodbc, pyspark
Database: sql -- install MS SQL Server
Evironment: window 10 64bit
Editor: cmd

Workflow:

Import raw data offline from csv, txt file source into DataLake (stored in MS SQL Server) with python. Then ETL (Extract Transform Load) data from DataLake into Data Warehouse with SSIS (SQL Server Integration Services).
Setup schedule for pipeline ETL.
Modeling and Visualization from DWH.
Crawl the online General Transport Feed Spec (GTFS) file into JSON file. Convert from Protobuf to JSON file or CSV then save it to my database with python and kafka streaming. Source: https://developer.nationaltransport.ie/
Streaming and draw the data into the dashboard to show the performance by sensor data with paho-mqtt (or kafka-python) and BI tool Grafana.

Output:

Data pipeline from data sources into target data.
Data stored in Data warehouse for analysis.
Raw data from Crawl the online General Transport Feed Spec.
Real-time dashboard with streaming processing.

Next Step:

Analysis data in DWH
Build Real-time dashboard for raw data from Crawl the online General Transport Feed Spec.

Owner

GitHub Repository

Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj

1 Jan 12, 2022

ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析（更新中）这些文件用于爬取ASOUL直播间的弹幕（其他直播间也可以）和其他信息，以及简单的数据分析生成。

159 Dec 10, 2022

Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

4 Aug 06, 2022

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022

Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

1 Nov 05, 2021

Powerful, efficient particle trajectory analysis in scientific Python.

freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics

195 Dec 20, 2022

A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

4 Oct 17, 2022

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

302 Dec 29, 2022

First steps with Python in Life Sciences

First steps with Python in Life Sciences This course material is part of the "First Steps with Python in Life Science" three-day course of SIB-trainin

22 Jan 08, 2023

Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

863 Jan 08, 2023

Working Time Statistics of working hours and working conditions by industry and company

Working Time Statistics of working hours and working conditions by industry and company

88 Nov 04, 2022

Falcon: Interactive Visual Analysis for Big Data

Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente

803 Dec 27, 2022

Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

906 Jan 01, 2023

Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-

293 Dec 29, 2022

4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

147 Dec 20, 2022

Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

3 Mar 03, 2022

Top 50 best selling books on amazon

It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years

1 Nov 18, 2021

Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021

A simplified prototype for an as-built tracking database with API

Asbuilt_Trax A simplified prototype for an as-built tracking database with API The purpose of this project is to: Model a database that tracks constru

1 Jan 31, 2022

PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

5 Sep 23, 2022