Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Shred your reddit comment and post history

trasheddit Shred your reddit comment and post history (x89/Shreddit replacement) Usage Simple Example Download trasheddit: git clone https://github.co

Elly 2 Jan 05, 2022
Receive GitHub webhook events and send to Telegram chats with AIOHTTP through Telegram Bot API

GitHub Webhook to Telegram Receive GitHub webhook events and send to Telegram chats with AIOHTTP through Telegram Bot API What this project do is very

Dash Eclipse 33 Jan 03, 2023
All in one Search Engine Scrapper for used by API or Python Module. It's Free!

All in one Search Engine Scrapper for used by API or Python Module. How to use: Video Documentation Senginta is All in one Search Engine Scrapper. Wit

33 Nov 21, 2022
The gPodder podcast client.

___ _ _ ____ __ _| _ \___ __| |__| |___ _ _ |__ / / _` | _/ _ \/ _` / _` / -_) '_| |_ \ \__, |_| \___/\__,_\__,_\___|_| |_

gPodder and related projects 1.1k Jan 04, 2023
Event-driven-model-serving - Unified API of Apache Kafka and Google PubSub

event-driven-model-serving Unified API of Apache Kafka and Google PubSub 1. Proj

Danny Toeun Kim 4 Sep 23, 2022
Easy and simple, Telegram Bot to Show alert when some edits a message in Group

Edit-Message-Alert Just a simple bot to show alert when someone edits a message sent by them, Just 17 Lines of Code These codes are for those who incu

Nuhman Pk 6 Dec 15, 2021
a small cli to generate AWS Well Architected Reports on the road

well-architected-review This repo intends to publish some scripts related to Well Architected Reviews. war.py extracts in txt & xlsx files all the WAR

4 Mar 18, 2022
Telegram vc - A bot that can play music on telegram group's voice call

Telegram Voice Chat Bot A bot that can play music on telegram group's voice call

1 Jan 02, 2022
TORNADO CASH Proxy Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)

TORNADO CASH Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX) ⭐️ A ful

Crypto Trader 1 Jan 06, 2022
🤖 The bot that runs the official Fairfield Programming Association Discord server.

🤖 The bot that runs the official Fairfield Programming Association Discord server.

Fairfield Programming Association 1 Jan 07, 2022
Discord Rich Presence for Team Fortress 2

TF2 Rich Presence Discord Rich Presence for Team Fortress 2 Detects current game state, queue info, playtime, and more Configurable, reliable, and per

Kataiser 33 Dec 31, 2022
A site devoted to celebrating to matching books with readers and readers with books. Inspired by the Readers' Advisory process in library science, Literati, and Stitch Fix.

Welcome to Readers' Advisory Greetings, fellow book enthusiasts! Visit Readers' Advisory! Menu Technologies Key Features Database Schema Front End Rou

jane martin 6 Dec 12, 2021
A python library built on the API of the coderHub.sa, which helps you to fetch the challenges and more

coderHub A python library built on the API of the coderHub.sa, which helps you to fetch the challenges and more Installation • Features • Usage • Lice

TheAwiteb 5 Nov 04, 2022
Minecraft checker

This Project checks if a minecraft account is a nfa/sfa account or invalid it also says you if the ip you are using is shadow banned from minecraft (shadow bann is if you send too many login attempts

baum1810 4 Oct 03, 2022
Pancakeswap Sniper Bot GUI Uniswap Matic 2022 (WINDOWS LINUX MAC) AUTO BUY TOKEN ON LAUNCH AFTER ADD LIQUIDITY

Pancakeswap Sniper Bot GUI Uniswap Matic 2022 (WINDOWS LINUX MAC) ⭐️ AUTO BUY TOKEN ON LAUNCH AFTER ADD LIQUIDITY ⭐️ ⭐️ First GUI SNIPER BOT for WINDO

Crypto Trader 1 Jan 05, 2022
Grocy-create-product - A script supports the batch creation of new products in Grocy

grocy-create-product This script supports the batch creation of new products in

André Heuer 6 Jul 28, 2022
A Python wrapper for discord slash-commands, designed to extend discord.py.

dislash.py An extending library for discord.py that allows to build awesome slash-commands. ⭐

173 Dec 19, 2022
AWS EC2 S3 Automated With python

AWS_EC2_S3_Automated Description This programme is a Python3 script that utilizes Boto3 to automate the process of creating an AWS EC2 instance with a

niall_crowe 2 Nov 16, 2021
Python Library for Accessing the Cohere API

Cohere Python SDK This package provides functionality developed to simplify interfacing with the Cohere API in Python 3. Documentation See the API's d

cohere.ai 42 Jan 03, 2023
A simple Python API wrapper for Cloudflare Stream's API.

python-cloudflare-stream A basic Python API wrapper for working with Cloudflare Stream. Arbington.com started off using Cloudflare Stream. We used the

Arbington 3 Sep 08, 2022