Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Migration Manager (MM) is a very small utility that can list source servers in a target account and apply mass launch template modifications.

Migration Manager Migration Manager (MM) is a very small utility that can list source servers in a target account and apply mass launch template modif

Cody 2 Nov 04, 2021
A script written in python3 for bruteforcing Gmail accounts.

GmailBruteforce Made for bruteforcing gmail accounts. It needs Less Secure Apps setting turned on in order to work. Installation For windows git clone

Shinero 4 Sep 16, 2022
A multi exploit instagram exploitation framework

Instagram Exploitation Framework About IEF Is an open source Instagram Exploitation Framework with various Exploits that could be used to mod your pro

Instagram Exploitation Framework - BirdSecurity 1 May 23, 2022
Backend.AI Client Library for Python

Backend.AI Client The official API client library for Backend.AI Usage (KeyPair mode) You should set the access key and secret key as environment vari

Lablup 10 Feb 10, 2022
摩尔庄园手游脚本

摩尔庄园 BlueStacks 脚本 手游上线,情怀再起,但面对游戏中枯燥无味的每日任务和资源采集,你是否觉得肝疼呢? 本项目通过生成 BlueStacks 模拟器的宏脚本,帮助玩家护肝。 使用脚本请阅读 使用方式 和对应的 功能及说明 联系 Telegram 频道 @mole61 Telegram

WH-2099 43 Dec 16, 2022
Telegram group manager moderen and simple.

Upin Robot A Advanced Powerful, Smart And Intelligent Group Management Bot With New And Powerful Features ... Written with Pyrogram and Telethon... If

Muhammad Nawawi 3 Dec 23, 2021
It's a discord.py simulator.

DiscordPySimulator It's a discord.py simulator. ⚠️ Things to fix Context As you may know, discord py commands provide the context as the first paramet

Juan Sebastián 11 Oct 24, 2022
Collaboration with Microsoft, AWS, Google, and ETHZürich Analytics Club (2022 Datathon Project)

DATATHON_ Collaboration with Microsoft, AWS, Google, and ETHZürich Analytics Club (2022 Datathon Project) Datathon Original Challenge SAV DataDays Rei

esthi 34 Nov 10, 2022
💖 Telegram - Telethon - UserBot 💖

『᭙ꪖ᥅ƺẞø†』 🇮🇳 ⚡ ᭙ꪖ᥅ƺBot Is One Of The Fastest & Smoothest Bot On Telegram Based on Telethon ⚡ Status Of Bot Telegram 🏪 YouTube 📺 Dєρℓογ το нєяοκυ D

Team WarZ 1 Mar 28, 2022
Projeto Informações Conta do Instagram - Instagram Account Information Project

VESTA-tools A collection of simple tools that proved to be needed for handling large periodic calculations with the VASP software package. distTotCalc

Thiago Souza 1 Dec 02, 2021
Acc-discord-rpc - Assetto Corsa Competizione Discord Rich Presence Client

A simple Assetto Corsa Competizione Rich Presence client. This app only works in

6 Dec 18, 2022
A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats.

Vcmusic-Userbot A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram R

3 Oct 23, 2021
Twitter bot that finds new friends in Twitter.

PythonTwitterBot Twitter Bot Thats Find New Friends pip install textblob pip install tweepy pip install googletrans check requirements.txt file Env

IbukiYoshida 4 Aug 11, 2021
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

TWINT - Twitter Intelligence Tool No authentication. No API. No limits. Twint is an advanced Twitter scraping tool written in Python that allows for s

TWINT Project 14.2k Jan 03, 2023
Automated crypto trading bot as adapted from Algovibes.

crypto-trading-bot Automated crypto trading bot as adapted from Algovibes. Pre-requisites Ensure that you have created a Binance API key before procee

Kai Koh 33 Nov 01, 2022
Telegram Google Translater Bot

Google-Translater-Bot Hey Mo Tech, I am simple Google Translater Bot. I can translate any language to you selected language Team Mo Tech Deploy To Her

21 Dec 01, 2022
A smart tool to backup members 📈 So you even after a raid/ ban you can easily restore them in seconds 🎲

🤑 Discord-backer 🤑 A open-source Discord member backup and restore tool for your server. This can help you get all your members in 5 Seconds back af

John 29 Dec 21, 2022
pyDuinoCoin is a simple python integration for the DuinoCoin REST API, that allows developers to communicate with DuinoCoin Master Server

PyDuinoCoin PyDuinoCoin is a simple python integration for the DuinoCoin REST API, that allows developers to communicate with DuinoCoin Main Server. I

BackrndSource 6 Jul 14, 2022
A link shortner telegram bot version 2 with advanced features

URL-Shortner-Bot-V2 A link shortner telegram bot version 2 with advanced features Made with Python3 (C) @FayasNoushad Copyright permission under MIT L

Fayas Noushad 18 Dec 29, 2022
Python version of PlaceNL's headless bot with automatic access token refresh

Reddit /r/place 2022 headless bot This headless Python bot will automatically login to reddit, obtain access tokens (and refreshes them when they expi

19 May 21, 2022