Free Data Engineering course!

Overview

Data Engineering Zoomcamp

Syllabus

Taking the course

Self-paced mode

All the materials of the course are freely available, so you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

2022 Cohort

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Syllabus

Week 1: Introduction & Prerequisites

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

Week 2: Data ingestion

  • Data Lake
  • Workflow orchestration
  • Setting up Airflow locally
  • Ingesting data to GCP with Airflow
  • Ingesting data to local Postgres with Airflow
  • Moving data from AWS to GCP (Transfer service)
  • Homework

More details

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitoning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning

More details

Week 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualising the data with google data studio and metabase

More details

Week 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

Week 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Week 7, 8 & 9: Project

Putting everything we learned to practice

  • Week 7 and 8: working on your own project
  • Week 9: reviewing your peers

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Airflow: Pipeline Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Tools

For this course you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

See Week 1 for more details about installing these tools

FAQ

  • Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
  • Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
  • Q: Will there be a certificate? A: Yes, if you complete the project
  • Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
  • Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)
  • Q: I'm stuck! I've got a technical question! A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Owner
DataTalksClub
The place to talk about data
DataTalksClub
A patch and keygen tools for typora.

A patch and keygen tools for typora.

Mason Shi 1.4k Apr 12, 2022
A Lite Package focuses on making overwrite and mending functions easier and more flexible.

Overwrite Make Overwrite More flexible In Python A Lite Package focuses on making overwrite and mending functions easier and more flexible. Certain Me

2 Jun 15, 2022
Repositório de código de curso de Djavue ministrado na Python Brasil 2021

djavue-python-brasil Repositório de código de curso de Djavue ministrado na Python Brasil 2021 Completamente baseado no curso Djavue. A diferença está

Buser 15 Dec 26, 2022
A Python3 script to decode an encoded VBScript file, often seen with a .vbe file extension

vbe-decoder.py Decode one or multiple encoded VBScript files, often seen with a .vbe file extension. Usage usage: vbe-decoder.py [-h] [-o output] file

John Hammond 147 Nov 15, 2022
This is a Fava extension to display a grouped portfolio view in Fava for a set of Beancount accounts.

Fava Portfolio Summary This is a Fava extension to display a grouped portfolio view in Fava for a set of Beancount accounts. It can also calculate MWR

18 Dec 26, 2022
Small Arrow Vortex clipboard processing library

Description Small Arrow Vortex clipboard processing library. Install You can install this library from PyPI with pip install av-clipboard-lib or compi

Delta Epsilon 1 Dec 18, 2021
Python decorator for `TODO`s

Python decorator for `TODO`s. Don't let your TODOs rot in your python projects anymore !

Klemen Sever 74 Sep 13, 2022
Earth centric orbit propagation tool. Built from scratch in python.

Orbit-Propogator Earth centric orbit propagation tool. Built from scratch in python. Functionality includes: tracking sattelite location over time plo

Adam Klein 1 Mar 13, 2022
Goal: Enable awesome tooling for Bazel users of the C language family.

Hedron's Compile Commands Extractor for Bazel — User Interface What is this project trying to do for me? First, provide Bazel users cross-platform aut

Hedron Vision 290 Dec 26, 2022
Collie is for uncovering RDMA NIC performance anomalies

Collie is for uncovering RDMA NIC performance anomalies. Overview Prerequ

Bytedance Inc. 34 Dec 11, 2022
Get a list of the top-10 rejected libraries in your WhiteSource inventory

WhiteSource Top 10 Rejected Libraries Generate a spreadsheet listing the 10 most common libraries in your WhiteSource inventory that were rejected by

WhiteSource-PS-tools 10 Mar 23, 2022
A desktop app to check the unlocked courses bases on previously done courses.

Course Picker A desktop app to check the unlocked courses bases on previously done courses. Table of contents About the Project Built with What it doe

Ahmed Symum Swapno 3 Feb 07, 2022
Percolation simulation using python

PythonPercolation Percolation simulation using python Exemple de percolation : Etude statistique sur le pourcentage de remplissage jusqu'à percolation

Tony Chouteau 1 Sep 08, 2022
Box CRUD API With Python

Box CRUD API: Consider a store which has an inventory of boxes which are all cuboid(which have length breadth and height). Each Cuboid has been added

Akhil Bhalerao 3 Feb 17, 2022
Calculatrix is a project where I'll create plenty of calculators in a lot of differents languages

Calculatrix What is Calculatrix ? Calculatrix is a project where I'll create plenty of calculators in a lot of differents languages. I know this sound

1 Jun 14, 2022
Multitrack exporter for OP-Z

Underbridge for OP-Z Multitrack exporter Description Exports patterns and projects individual audio tracks to seperate folders for use in your DAW. Py

Thomas Herrmann 71 Dec 25, 2022
Entitlement AND Hardened Runtime Check

Python3 script for macOS to recursively check /Applications and also check /usr/local/bin, /usr/bin, and /usr/sbin for binaries with problematic/interesting entitlements. Also checks for hardened run

Cedric Owens 79 Nov 16, 2022
Runtime fault injection platform by Daniele Rizzieri (2021)

GDBitflip [v1.04] Runtime fault injection platform by Daniele Rizzieri (2021) This platform executes N times a binary and during each execution it inj

Daniele Rizzieri 1 Dec 07, 2021
A python script for compiling and executing .cc files

Debug And Run A python script for compiling and executing .cc files Example dbrun fname.cc [DEBUG MODE] Compiling fname.cc with C++17 ------------

1 May 28, 2022
A modern Python build backend

trampolim A modern Python build backend. Features Task system, allowing to run arbitrary Python code during the build process (Planned) Easy to use CL

Filipe Laíns 39 Nov 08, 2022