Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

This project, search all entities related to A2P in twilio

Mirror A2P Twilio This project, search all entities related to A2P in twilio (phone numbers, messaging services, campaign, A2P brand information and P

Iván Cárdenas 2 Nov 03, 2022
A Python 2.7/3.x module for Amcrest Cameras using the SDK HTTP API.

A Python 2.7/3.x module for Amcrest Cameras using the SDK HTTP API. Amcrest and Dahua devices share similar firmwares. Dahua Cameras and NVRs also work with this module.

Marcelo Moreira de Mello 176 Dec 21, 2022
A fork of discord.py meant to replace it

Texus A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. Key Features Modern Pythonic API using async and

Texus 1 Nov 18, 2021
Telegram File to Link Fastest Bot , also used for movies streaming

Telegram File Stream Bot ! A Telegram bot to stream files to web. Report a Bug | Request Feature About This Bot This bot will give you stream links fo

Avishkar Patil 194 Jan 07, 2023
Telegram Music Bot for YouTube/SoundCloud/Mixcloud

Telegram Music Bot Telegram Music Bot for YouTube/SoundCloud/Mixcloud This bot downloads and sends the audio when someone send a YouTube/SoundCloud/Mi

Calls Music 76 Jan 02, 2023
An advanced QR Code telegram bot with more features.

QR Code Bot A telegram qr code encode and decode bot Advanced Features 1. Database ( MongoDB ) Support 2. Broadcast Support 3. Status Command 4. Setti

Fayas Noushad 16 Nov 12, 2022
Tinkoff social pulse api wrapper

Tinkoff social pulse api wrapper

Semenov Artur 9 Dec 20, 2022
HackZ-Token-Grabber-V2 - HackZ Token Grabber V2

HackZ-Token-Grabber-V2 was made by Love ❌ code ✅ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ 🌟

! ™NightMare 2 Mar 01, 2022
Python wrapper for Interactive Brokers Client Portal Web API

EasyIB: Unofficial Wrapper for Interactive Brokers API EasyIB is an unofficial python wrapper for Interactive Brokers Client Portal Web API. Features

39 Dec 13, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Jan 19, 2022
Send song lyrics to iMessage users using the Genius lyrics API

pyMessage Send song lyrics to iMessage users using the Genius lyrics API. Setup 1.) Open the main.py file, and add your API key on line 7. 2.) Install

therealkingnull 1 Jan 23, 2022
A multi-password‌ cracking tool that can help you hack facebook accounts very quickly

FbCracker This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly. Facebook Hacking Tool Installation On Termux

ReD H4CkeR 9 Nov 16, 2022
A Discord Self-Bot in Python

👨‍💻 Discord Self Bot 👨‍💻 A Discord Self-Bot in Python by natrix Installation Run: selfbot.bat Python: version : 3.8 Modules

natrix_dev 3 Oct 02, 2022
AKShare is an elegant and simple financial data interface library for Python, built for human beings

AKShare is an elegant and simple financial data interface library for Python, built for human beings

AKFamily 5.8k Dec 30, 2022
Github integration with Telegram

The Telegram bot myGit is your GiHub assistant. In your conversations with your team, you can simply insert the information about the projects you are working at.

Alexandru Buzescu 2 Jan 06, 2022
Anchor Protocol Script that can save you from being liquidated!

Why My day job requires a fairly good amount of automation from time to time. Besides, I do like computers to work on what I cannot while I'm sleeping

126 Oct 16, 2022
An all-in-one financial analytics and smart portfolio creator as a Discord bot!

An all-in-one financial analytics bot to help you gain quantitative financial insights. Finn is a Discord Bot that lets you explore the stock market like you've never before!

6 Jan 12, 2022
Powerful Telegram Members Scraping and Adding Toolkit

🔥 Genisys V2.1 Powerful Telegram Members Scraping and Adding Toolkit 🔻 Features 🔺 ADDS IN BULK[by user id, not by username] Scrapes and adds to pub

The Cryptonian 16 Mar 01, 2022
Url-shortener - A url shortener made in python using the API's from the pyshorteners lib

URL Shortener Um encurtador de link feito em python usando as API's da lib pysho

Spyware 3 Jan 07, 2022
Wrapper for vk_api lib for faster bot buliding

Welcome to VKBotPod repository! Wrapper for vk_api lib for faster bot buliding Features Simple syntax Rich functionality Special thanks to movpushmov

NullPointerException 3 Jan 14, 2022