Free Data Engineering course!

Overview

Data Engineering Zoomcamp

Syllabus

Taking the course

Self-paced mode

All the materials of the course are freely available, so you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

2022 Cohort

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Syllabus

Week 1: Introduction & Prerequisites

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

Week 2: Data ingestion

  • Data Lake
  • Workflow orchestration
  • Setting up Airflow locally
  • Ingesting data to GCP with Airflow
  • Ingesting data to local Postgres with Airflow
  • Moving data from AWS to GCP (Transfer service)
  • Homework

More details

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitoning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning

More details

Week 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualising the data with google data studio and metabase

More details

Week 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

Week 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Week 7, 8 & 9: Project

Putting everything we learned to practice

  • Week 7 and 8: working on your own project
  • Week 9: reviewing your peers

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Airflow: Pipeline Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Tools

For this course you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

See Week 1 for more details about installing these tools

FAQ

  • Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
  • Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
  • Q: Will there be a certificate? A: Yes, if you complete the project
  • Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
  • Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)
  • Q: I'm stuck! I've got a technical question! A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Owner
DataTalksClub
The place to talk about data
DataTalksClub
The Zig programming language, packaged for PyPI

Zig PyPI distribution This repository contains the script used to repackage the releases of the Zig programming language as Python binary wheels. This

Zig Programming Language 100 Nov 04, 2022
OB_Template is a vault template reference for using Obsidian.

Obsidian Template OB_Template is a vault template reference for using Obsidian. If you've tested out Obsidian. and worked through the "Obsidian Help"

323 Dec 27, 2022
A python script to make leaderboards using a CSV with the runners name, IDs and Flag Emojis

SrcLbMaker A python script to make speedrun.com global leaderboards. Installation You need python 3.6 or higher. First, go to the folder where you wan

2 Jul 25, 2022
XlvnsScriptTool - Tool for decompilation and compilation of scripts .SDT from the visual novel's engine xlvns

XlvnsScriptTool English Dual languaged (rus+eng) tool for decompiling and compiling (actually, this tool is more than just (dis)assenbler, but less th

Tester 3 Sep 15, 2022
A web app that is written entirely in Python

University Project About I made this web app to finish a project assigned by my teacher. It is written entirely in Python, thanks to streamlit to make

15 Nov 27, 2022
MODSKIN-LOLPRO-updater: The mod is fkn 10y old and has'nt a self-updater

The mod is fkn 10y old and has'nt a self-updater. To use it just run the exec, wait some seconds, and it will run the new modsk

Shiro Amurha 3 Apr 23, 2022
A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

1 Dec 19, 2021
peace-performance (Rust) binding for python. To calculate star ratings and performance points for all osu! gamemodes

peace-performance-python Fast, To calculate star ratings and performance points for all osu! gamemodes peace-performance (Rust) binding for python bas

9 Sep 19, 2022
An kind of operating system portal to a variety of apps with pure python

pyos An kind of operating system portal to a variety of apps. Installation Run this on your terminal: git clone https://github.com/arjunj132/pyos.git

1 Jan 22, 2022
Online learning platform

๐Ÿ›  Status: In Development Teached is currently in development. So we encourage you to use it and give us your feedback, but there are things that have

Mohamed Nesredin 2 Feb 07, 2021
Moleey Panel with python 3

Painel-Moleey pkg upgrade && pkg update pkg install python3 pip install pyfiglet pip install colored pip install requests pip install phonenumbers pkg

Moleey. 1 Oct 17, 2021
1. ๋„ค์ด๋ฒ„ ์นดํŽ˜ ๋Œ“๊ธ€์„ ๋นจ๋ฆฌ ๋‹ค๋Š” ๊ธฐ๋Šฅ

naver_autoprogram ๊ธฐ๋Šฅ ์„ค๋ช… ๋„ค์ด๋ฒ„ ์นดํŽ˜ ๋Œ“๊ธ€์„ ๋นจ๋ฆฌ ๋‹ค๋Š” ๊ธฐ๋Šฅ ๋„ค์ด๋ฒ„ ์นดํŽ˜ ์ž๋™ ์ถœ์„ ์ฒดํฌ ๊ธฐ๋Šฅ ๋™์ž‘ ๋ฐฉ์‹ ์นดํŽ˜ ๋Œ“๊ธ€ ๊ธฐ๋Šฅ ๊ธฐ๋ณธ ๋™์ž‘์€ ์ฃผ๊ธฐ์ ์ธ ์Šค์ผ€์ฅด ๋™์ž‘์œผ๋กœ ํ•ด๋‹น ์นดํŽ˜ ID ์™€ ํŠน์ • API ์ฃผ์†Œ๋กœ ๋Œ€์ƒ์ด ์ƒˆ๊ธ€์„ ์ž‘์„ฑํ–ˆ๋Š”์ง€ ์ฒดํฌ. ํ•ด๋‹น ๋Œ€์ƒ์ด ์ƒˆ๊ธ€ ๋“ฑ

1 Dec 22, 2021
A timer for bird lovers, plays a random birdcall while displaying its image and info.

Birdcall Timer A timer for bird lovers. Siriema hatchling by Junior Peres Junior Background My partner needed a customizable timer for sitting and sta

Marcelo Sanches 1 Jul 08, 2022
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.

Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps find essential insights fro

Adansons Inc 27 Oct 22, 2022
Absolute solvation free energy calculations with OpenFF and OpenMM

ABsolute SOLVantion Free Energy Calculations The absolv framework aims to offer a simple API for computing the change in free energy when transferring

7 Dec 07, 2022
Painel de consulta

โš™ FullP 1.1 Instalaรงรฃo ๐Ÿ’ป git clone https://github.com/gav1x/FullP.git cd FullP pip3 install -r requirements.txt python3 main.py Um pequeno

gav1x 26 Oct 11, 2022
The Great Autoencoder Bake Off

The Great Autoencoder Bake Off The companion repository to a post on my blog. It contains all you need to reproduce the results. Features Currently fe

Tilman Krokotsch 61 Jan 06, 2023
Python package that mirrors the original Nodejs ReplAPI-It.

Python-ReplAPI-It Python package that mirrors the original Nodejs ReplAPI-It. Contributing First fork the repo: $ git clone https://github.com/ReplAPI

The ReplAPI.it Project 10 Jun 05, 2022
Python programs, usually short, of considerable difficulty, to perfect particular skills.

Peter Norvig MIT License 2015-2020 pytudes "An รฉtude (a French word meaning study) is an instrumental musical composition, usually short, of considera

Peter Norvig 19.9k Dec 27, 2022
A tool to help the Poly copy-reading process! :D

PolyBot A tool to help the Poly copy-reading process! :D Let's face it-computers are better are repeatitive tasks. And, in spite of what one may want

1 Jan 10, 2022