Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

We have made you a wrapper you can't refuse

We have made you a wrapper you can't refuse We have a vibrant community of developers helping each other in our Telegram group. Join us! Stay tuned fo

20.6k Jan 04, 2023
A simple python discord bot which give you a yogurt brand name, basing on a large database often updated.

YaourtBot A discord simple bot by Lopinosaurus Before using this code : ・Move env file to .env ・Change the channel ID on line 38 of bot.py to your #pi

The only one bunny who can dev. 0 May 09, 2022
A Python Library to interface with Flickr REST API, OAuth & JSON Responses

Python-Flickr Python-Flickr is A Python library to interface with Flickr REST API & OAuth Features Photo Uploading Retrieve user information Common Fl

Mike Helmick 40 Sep 25, 2021
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

Market Sentiment 345 Dec 17, 2022
HinamiRobot - Telegram Group Manager Bot Written In Python Using Pyrogram

✨ HINAMI CHAN ✨ Telegram Group Manager Bot Written In Python Using Pyrogram. Rea

DARK LEGEND088 2 Jan 27, 2022
LimitatiBot - A simple telegram bot to establish a conversation with a user without having to use private chats

🤖 LimitatiBot [0.2] LimitatiBot is a simple telegram bot to establish a convers

xMrPente 9 Dec 27, 2022
Asynchronous wrapper for wttr.in weather forecast.

aiopywttr Asynchronous wrapper for wttr.in weather forecast. Synchronous version here. Installation pip install aiopywttr Example This example prints

Almaz 4 Dec 24, 2022
Telegram Auto Filter Bot

Pro Auto Filter Bot V2.o Hey Mo Tech, I'm an Autofilter bot v2.O and you can not Add Me to your Group. I was made for this one group. So don't waste y

14 Oct 20, 2021
Davide Gallitelli 3 Dec 21, 2021
A heraldry-related bot, designed for the Heraldry Community.

Heraldtron A heraldry-related bot, designed for the Heraldry Community. Requirements Python 3.9+ discord.py aiohttp (comes installed with discord.py)

1 Mar 31, 2022
A self-hosted Discord music bot.

Cassette A self-hosted Discord music bot. Requirements py-cord pynacl pytube Setup Intended to be hosted on Heroku. Fork or clone this repo. Create a

Lohan 8 Apr 28, 2022
Yes, it's true :revolving_hearts: This repository has 301 stars.

Yes, it's true! Inspired by a similar repository from @RealPeha, but implemented using a webhook on AWS Lambda and API Gateway, so it's serv

511 Dec 30, 2022
A telegram bot does not allow channels to send messages to the telegram supergroup

Channel Message Handler Getting started Installation $ git clone https://github.com/AbhijithNT/GroupChannelHandler.git Change directory $ cd ChannelMe

Abhijith N T 0 Dec 26, 2021
Busty - A bot for the Busty Discord server

Busty Discord bot used for the Busty server. Install You'll need at least Python

Andrew Morgan 7 Dec 05, 2022
Pinopoly is a tool to remove the "banker" player and replace them with a digitalized system

Pinopoly is a tool to remove the "banker" player and replace them with a digitalized system. It is intended to be used on a Raspberry Pi but can be used in the command line as well.

Alex Overstreet 11 Jul 09, 2022
Powerful spammer bots for telegram made with python and telethon.

Powerful spammer bots for telegram made with python and telethon. We can deploy upto 70 bots at a time.

32 Dec 15, 2022
Maestral is an open-source Dropbox client written in Python.

Maestral - A light-weight and open-source Dropbox client for macOS and Linux

2.6k Jan 03, 2023
A simple script that will watch a stream for you and earn the channel points.

Credits Main idea: https://github.com/gottagofaster236/Twitch-Channel-Points-Miner Bet system (Selenium): https://github.com/ClementRoyer/TwitchAutoCo

Alessandro Maggio 1.1k Jan 08, 2023
A minimal open source mtg-like tcg game made in python that can be played on a terminal emulator using a keyboard.

TCG-TERM Project state: 🔧 🚧 🚧 🚧 Incomplete, In development 🚧 🚧 🚧 👷 (Keep in mind that at the moment, This project is currently undone, and wil

Amos 3 Aug 29, 2021
Surfline Forecast Bot For Python

Surfline Forecast Bot A telegram bot created using Telethon that allows users to

1 May 08, 2022