Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Auto-Rollnumber-sender - Auto Rollnumber sender with python

Auto-Rollnumber-sender The above code fits better on my system but it can vary s

Riya Tripathi 2 Feb 14, 2022
Discord opsiyonel detaylı hava durumu botu

WeatherBot Discord opsiyonel detaylı hava durumu botu önümüzdeki Perşembe ──► önümüzdeki Çarşamba ┌─────────┐┌─────────┐┌─────────┐┌───────

DejaVu 16 Dec 19, 2022
A Slack bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups

A Slack app/bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups. The intent is to make the workday more fun & active for remote teams.

Kyle McIntyre 3 Sep 19, 2022
Sakamata-alpha-pycord - Sakamata bot alpha with pycord

sakamatabot このリポジトリは? ホロライブ所属VTuber沙花叉クロヱさんの非公式ファンDiscordサーバー「クロヱ水族館」の運営/管理補助を行う

sushichaaaan 1 May 04, 2022
A collective list of free APIs for use in software and web development.

Public APIs A collective list of free APIs for use in software and web development. A public API for this project can be found here! For information o

222.5k Jan 02, 2023
fair-test is a library to build and deploy FAIR metrics tests APIs supporting the specifications used by the FAIRMetrics working group.

☑️ FAIR test fair-test is a library to build and deploy FAIR metrics tests APIs supporting the specifications used by the FAIRMetrics working group. I

Maastricht University IDS 6 Oct 30, 2022
Discord spam bots with multiple account support and more

Discord spam bots with multiple account support and more. PLEASE READ EVERYTHING BEFORE WRITING AN ISSUE!! Server Messages Text Image Dm Messages Text

Mr. Nobody 6 Sep 14, 2022
Mark Sullivan 66 Dec 13, 2022
An incomplete add-on extension to Pyrogram, to create telegram bots a bit more easily

PyStark A star ⭐ from you means a lot An incomplete add-on extension to Pyrogram

Stark Bots 36 Dec 23, 2022
REPO USERBOT YANG DIBUAT DARI BERBAGAI REPO USERBOT GITHUB.

Lord Userbot Userbot Yang Digunakan Untuk Bersenang-Senang Di Telegram Repo Lord Userbot Repo Yang Dibuat Alvin Dari Berbagai Repo Userbot Github CARA

Alvin 70 Jan 02, 2023
Telegram to TamTam stickers

Telegram to TamTam stickers @tg_stickers TamTam бот, который конвертирует Telegram стикеры в формат TamTam и помогает загрузить их в TamTam. Все делае

Ivan Buymov 22 Nov 01, 2022
Yet another random discord bot.

YARDB (r!) Yet another fully functional and random discord bot. I might add more features if I'm bored also don't criticize on my code. Commands: 4 Di

kayle 1 Oct 21, 2021
FUD Keylogger That Reports To Discord

This python script will capture all of the keystrokes within a given time frame and report them to a Discord Server using Webhooks. Instead of the traditional

●┼Waseem Akram••✓⁩ 16 Dec 08, 2022
Discord bot ( discord.py ), uses pandas library from python for data-management.

Discord_bot A Best and the most easy-to-use Discord bot !! Some simple basic auto moderations, Chat functions. It includes a game similar to Casino, g

Jaitej 4 Aug 30, 2022
A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

CHOICE_TinyMan_Wrapper A wrapper that allows users to acquire Choice Coin directly through their Terminal using ALGO and various Algorand Standard Ass

Choice Coin 16 Sep 24, 2022
troposphere - Python library to create AWS CloudFormation descriptions

troposphere - Python library to create AWS CloudFormation descriptions

4.8k Jan 06, 2023
TeslaPy - A Python implementation based on unofficial documentation of the client side interface to the Tesla Motors Owner API

TeslaPy - A Python implementation based on unofficial documentation of the client side interface to the Tesla Motors Owner API, which provides functiona

Tim Dorssers 233 Dec 30, 2022
NMux is the version of the NMscript in termux

NMscript-termux-version Termux-Version NMux is the termux version of NMscript which is NMscript? NMscript is a simple script written in Python that he

cabeson sin z 5 Apr 23, 2022
Quack-SMS-BOMBER - Quack Toolkit By IkigaiHack

Quack Toolkit By IkigaiHack About Quack Toolkit Quack Toolkit is a set of tools

Marcel 2 Aug 19, 2022
Clipboard-watcher - Keep an eye on the apps that are using your clipboard

clipboard-watcher This repository contains the code of an experiment, in order t

Gonçalo Valério 48 Oct 13, 2022