Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
VIT - VideoInTerminal. A quick piece of code to play videos in your terminal using python

VIT VIT - VideoInTerminal. A quick piece of code to play videos in your terminal using python.

ShellTear 3 Mar 03, 2022
Techie Sneh 17 Nov 23, 2021
Tweet stream in OBS browser source

OBS-Twitter-Stream OBSなどの配信ソフトのブラウザソースで特定のキーワードを含んだツイートを表示します 使い方 使い方については以下のwikiを御覧ください https://github.com/CubeZeero/OBS-Twitter-Stream/wiki ダウンロード W

Cube 23 Dec 18, 2022
goal: render videos on eu4's timeline function

Rendering Videos on the EU4 Time Line This repository contains code to create an eu4-savefile that plays back a video in question.

29 Dec 24, 2022
A youtube video link or id to video thumbnail python package.

Youtube-Video-Thumbnail A youtube video link or id to video thumbnail python package. Made with Python3

Fayas Noushad 10 Oct 21, 2022
deepstream python rtsp video h264 or gstreamer python rtsp h264 | h264

deepstream python rtsp video h264 or gstreamer python rtsp h264 | h264 deepstrea

Small white Tang 6 Dec 14, 2022
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Sidharth Anand 12 Sep 10, 2022
Vigia-youtube - The YouTube Watch bot is able to monitor channels on Google's video platform

Vigia do YouTube O bot Vigia do YouTube é capaz de monitorar canais na plataform

Alessandro Feitosa Jr 10 Oct 03, 2022
Script simples para baixar vídeos/áudios/playlist do YouTube

🔗 VilelaTube ▶️ Script simples para baixar vídeos/áudios/playlist do YouTube Requisitos • Como usar • Melhorias futuras ⚠️ Atenção! ⚠️ Lembre-se de a

João Victor Vilela dos Santos 2 Nov 03, 2021
Zaid Vc Player Bot For Telegram

Zaid Vc Player Bot For Telegram

1 Dec 04, 2021
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
Help for manipulating the plex-media-server transcode on the raspberry pi

raspi-plex-transcode Help for manipulating the plex-media-server transcode on the raspberry pi Ensure hardware decoding works and your firmware is up

10 Sep 29, 2022
Simple nightcore song+video maker

nighty nighty is a simple nightcore song maker (+ video) Installation clone the repo wherever you want git clone https://www.github.com/DanyB0/nighty

DanyB0 2 Oct 02, 2022
BlogBot - a Python script that create blogs from YouTube videos.

BlogBot - Convert Youtube Videos To Blogs BlogBot is a Python script that create blogs from YouTube videos.

Nikhil Bhamere 4 Apr 22, 2022
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

Tugcan Olgun 141 Jan 06, 2023
Python package to display video in GUI using OpenCV-Python and PySide6

Python package to display video in GUI using OpenCV-Python and PySide6. Introduction cv2PySide6 is a package which provides utility classes and functi

3 Jun 06, 2022
Rembg Video Virtual Green Screen Edition

Rembg Virtual Greenscreen Edition is a tool to create a green screen matte for videos

Tim Scarfe 217 Jan 06, 2023
pyYotubemanager is full web automated bot capable of General tasks like:- Uploading a Video , Downloading , adding Title , Description , Listing types , adding Thumbnail

PyYoutubemanager Explore the docs » View Demo · Report Bug · Request Feature About The Project PyYotubemanager is full web automated bot capable of Ge

4 Jun 29, 2022
Python Simple Mass Video Clipper (PSMVC)

Python Simple Mass Video Clipper (PSMVC) PSMVC é um gerador de cortes via terminal construído em python. Uso Basta abrir o arquivo start.py Dependenci

Bruno 2 Oct 16, 2021
Automatically segment in-video YouTube sponsorships.

SponsorBlock Auto Segment [Model Download] Automatically segment in-video YouTube sponsorships. Trained on a large dataset of YouTube sponsor transcri

Akmal 7 Aug 22, 2022