Skip to content

melwinmpk/PizzaOrders_DataPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PizzaOrders_DataPipeline

There is a Tony who is owning a New Pizza shop.
He knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire
so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!

Tony started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Tony’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Now he wants to know how is his business going on he needs some answers to his questions from the data. but the data which is stored is not in an appropriate format. He Approaches a Data Engineer to process and store the data for him and get the answers to his question

The data are stored in the different CSV files

  • customer_orders.csv
    Columns=>order_id,customer_id,pizza_id,exclusions,extras,order_time
  • pizza_names.csv
    Columns=> pizza_id,pizza_name
  • pizza_recipes.csv
    Columns=>pizza_id,toppings
  • pizza_toppings.csv
    Columns=>topping_id,topping_name
  • runner_orders.csv
    Columns=>order_id,runner_id,pickup_time,distance,duration,cancellation
  • runners.csv
    Columns=> runner_id,registration_date

The Answers the Tony wanted for

  1. How many pizzas were ordered?
  2. How many unique customer orders were made?
  3. How many successful orders were delivered by each runner?
  4. How many of each type of pizza was delivered?
  5. How many Vegetarian and Meatlovers were ordered by each customer?
  6. What was the maximum number of pizzas delivered in a single order?
  7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
  8. How many pizzas were delivered that had both exclusions and extras?
  9. What was the total volume of pizzas ordered for each hour of the day?
  10. Wh/at was the volume of orders for each day of the week?

Requirements

  1. Store the data In MY SQL table
  2. Using Sqoop Store the Data in Hive
  3. Using the PySpark the get the Results for the question
  4. Store the Results in Seperate Table
  5. Automate entire process in the Airflow

AirFlow Output

About

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages