Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
BlogBot - a Python script that create blogs from YouTube videos.

BlogBot - Convert Youtube Videos To Blogs BlogBot is a Python script that create blogs from YouTube videos.

Nikhil Bhamere 4 Apr 22, 2022
A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

Akascape 131 Dec 31, 2022
In this project, we will be blurring the background in a live video feed

In this project, we will be blurring the background in a live video feed. This can be further integrated into online meetings, streamings etc.

Hassan Shahzad 2 Jan 06, 2022
A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract.

Video Subtitle Extractor Bot A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract. Note that the accuracy of reco

14 Oct 28, 2022
Home Assistant custom component for viewing IP cameras RTSP stream in real time using WebRTC technology

WebRTC Camera Home Assistant custom component for viewing IP cameras RTSP stream in real time using WebRTC technology. Based on: Pion - pure Go implem

Alex X 739 Dec 30, 2022
A pure python media player that can be used in AI media API development.

A pure python media player that can be used in AI media API development.

YDOOK 1 Dec 04, 2021
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

Tugcan Olgun 141 Jan 06, 2023
An easy to use GUI based video to image sequence converter (and vice versa).

Vdo & Img Conversion Tools This is a quick conversion tool made with python that can save you a lot of time. With this tool you can extract image sequ

Akash Bora 3 Sep 18, 2022
Telegram Video Chat Video Streaming bot 🇱🇰

🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re

DOOZY YEZ 5 Jun 26, 2022
Video stream image stacking -- live version

video stream image stacking v2 -- live version A very simple streamed video image stacking code! Version 2.1 left mouse click to select a small region

Chakravarthy Mathiazhagan 1 Jan 03, 2022
Help for manipulating the plex-media-server transcode on the raspberry pi

raspi-plex-transcode Help for manipulating the plex-media-server transcode on the raspberry pi Ensure hardware decoding works and your firmware is up

10 Sep 29, 2022
Tautulli - A Python based monitoring and tracking tool for Plex Media Server.

Tautulli A python based web application for monitoring, analytics and notifications for Plex Media Server. This project is based on code from Headphon

Tautulli 4.7k Jan 07, 2023
Stream deck using Arduino and Python

Stream deck using Arduino and Python This is a little project I started due to the fact that I wanted to stream and didn't want to spend lots on a sim

Tal Cherniavsky 2 Feb 11, 2022
goal: render videos on eu4's timeline function

Rendering Videos on the EU4 Time Line This repository contains code to create an eu4-savefile that plays back a video in question.

29 Dec 24, 2022
It is a simple python package to play videos in the terminal using characters as pixels

It is a simple python package to play videos in the terminal using characters as pixels

Joel Ibaceta 1.4k Jan 07, 2023
Tweet stream in OBS browser source

OBS-Twitter-Stream OBSなどの配信ソフトのブラウザソースで特定のキーワードを含んだツイートを表示します 使い方 使い方については以下のwikiを御覧ください https://github.com/CubeZeero/OBS-Twitter-Stream/wiki ダウンロード W

Cube 23 Dec 18, 2022
Add filters (background blur, etc) to your webcam on Linux.

Add filters (background blur, etc) to your webcam on Linux.

Jashandeep Sohi 480 Dec 14, 2022
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
Video processing routines for SciPy

scikit-video Video Processing SciKit BETA Video processing algorithms, including I/O, quality metrics, temporal filtering, motion/object detection, mo

Alex Izvorski 119 Dec 27, 2022