Airflow Operator for running Soda SQL scans

Overview

Soda SQL Airflow Operator

Airflow Operator for running Soda SQL scans

Example Usage

# src/soda/scans/my_scan.yml
# Note that the Airflow rendered templates are accessible (e.g. {{ params.client_id }})
table_name: tmp_{{ params.client_id }}_{{ ds_nodash }}
sql_metrics:
  - sql: |
      SELECT
        SUM(value1) AS staged_value1,
        SUM(value2) AS staged_value2
      FROM tmp_{{ params.client_id }}_{{ ds_nodash }}
  - sql: |
      SELECT
        SUM(value1) AS final_value1,
        SUM(value2) AS final_value2
      FROM final_table
      WHERE
        date = '{{ ds }}'
        AND client_id = {{ params.client_id }}
tests:
  - staged_value1 == final_value1
  - staged_value2 > final_value2
# my_airflow_dag.py
from pathlib import Path
from soda_util import build_soda_warehouse, convert_templated_yml_to_dict

SODA_PATH = Path(os.getenv("PYTHON_PATH", "/code/src")) / "/soda/scans/"  # Matches where my_scan.yml is saved

validate_staged_data = SodaSqlOperator(
    task_id="validate_staged_data",
    warehouse=build_soda_warehouse("warehouse_name", "database_name"),  # Could also pass a file path to a yml file
    scan=convert_templated_yml_to_dict(SODA_PATH, "my_scan.yml"),
    params={"client_id": 12345},  # Params are rendered by Airflow and accessible in the yaml file
)

Notes

  • Unlike Soda itself, a builder pattern is not used to define the warehouse and scan argument. Rather, the warehouse and scan parameters are instance checked and the relevant Soda methods are set. This provides a much simpler API, where we can just pass in the args to the Operator
  • As we are passing over all rendering of Jinga templates to Airflow, the native Soda templates are not accessible. So always use Airflow templates
  • Soft failures (i.e. the Airflow task doesn't fail, it just alerts) have been implemented, but alerting of soft failures has not. So soft failures will essentially just mean the Airflow task passes. Alerting to be implemented
Owner
Todd de Quincey
Data Engineer, Chartered Accountant and all round nice guy (or so I like to think). I believe that quality, simplicity and focus are the keys to success
Todd de Quincey
Set of tools to analyze Tinynuke samples

tinynuke-toolset You'll find in that repository a set of tools and scripts I developped to analyze Tinynuke samples. Dll extractor: script used to ext

Heat Miser 14 Aug 18, 2022
2 Way Sync Between Notion Database and Google Calendar

Notion-and-Google-Calendar-2-Way-Sync 2 Way Sync Between a Notion Database and Google Calendar WARNING: This repo will be undergoing a good bit of cha

248 Dec 26, 2022
A code ecosystem that helps to find the equate any formula.

A code ecosystem that helps to find the equate any formula. The good part here is that the code finds the formula needed and/or operates on a formula (performs algebra) on it to give you an answer.

SubtleCoder 1 Nov 23, 2021
An assistant to guess your pip dependencies from your code, without using a requirements file.

Pip Sala Bim is an assistant to guess your pip dependencies from your code, without using a requirements file. Pip Sala Bim will tell you which packag

Collage Labs 15 Nov 19, 2022
Add-In for Blender to automatically save files when rendering

Autosave - Render: Automatically save .blend, .png and readme.txt files when rendering with Blender Purpose This Blender Add-On provides an easy way t

Volker 9 Aug 10, 2022
Repository for 2021 Computer Vision Class @ Chulalongkorn University

2110443 - Computer Vision (2021/2) Computer Vision @ Chulalongkorn University Anaconda Download Link https://www.anaconda.com/download/ Miniconda and

Chula PIC Lab 5 Jul 19, 2022
A way to write regex with objects instead of strings.

Py Idiomatic Regex (AKA iregex) Documentation Available Here An easier way to write regex in Python using OOP instead of strings. Makes the code much

Ryan Peach 18 Nov 15, 2021
Similarity checking of sign languages

Similarity checking of sign languages This repository checks for similarity betw

Tonni Das Jui 1 May 13, 2022
Path of Exile Vendor Recipe Tracker (Chaos/Regal orb)

Path of Exile Vendor Trade Tracker Are you tired of manually keeping track of collected and missing items for farming Chaos or Regal Orbs in PoE? Me t

1 Nov 09, 2021
Kivy program for identification & rotation sensing of objects on multi-touch tables.

ObjectViz ObjectViz is a multitouch object detection solution, enabling you to create physical markers out of any reliable multitouch solution. It's e

TangibleDisplay 8 Apr 04, 2022
Demo of a WAM Prolog implementation in Python

Prol: WAM demo This is a simplified Warren Abstract Machine (WAM) implementation for Prolog, that showcases the main instructions, compiling, register

Bruno Kim Medeiros Cesar 62 Dec 26, 2022
Meower a social media platform written in Scratch 3.0 and Python

Meower Meower is a social media platform written in Scratch 3.0 and Python, ported to HTML for self-hosting. Try Beta 4.6 Changelog for 4.6 Start impl

Meower Media Co. 23 Dec 02, 2022
Beatsaber for Python

beatsaber Beatsaber for Python It was automatically generated with mkpylib. If you're reading this message, it m

Shawn Presser 3 Jul 30, 2021
Python script to preprocess images of all Pokémon to finetune ruDALL-E

ai-generated-pokemon-rudalle Python script to preprocess images of all Pokémon (the "official artwork" of each Pokémon via PokéAPI) into a format such

Max Woolf 132 Dec 11, 2022
Python MQTT v5.0 async client

gmqtt: Python async MQTT client implementation. Installation The latest stable version is available in the Python Package Index (PyPi) and can be inst

Gurtam 306 Jan 03, 2023
importlib_resources is a backport of Python standard library importlib.resources module for older Pythons.

importlib_resources is a backport of Python standard library importlib.resources module for older Pythons. The key goal of this module is to replace p

Python 36 Dec 13, 2022
Extract continuous and discrete relaxation spectra from G(t)

pyReSpect-time Extract continuous and discrete relaxation spectra from stress relaxation modulus G(t). The papers which describe the method and test c

3 Nov 03, 2022
A python package that computes an optimal motion plan for approaching a red light

redlight_approach redlight_approach is a Python package that computes an optimal motion plan during traffic light approach. RLA_demo.mov Given the par

Jonathan Roy 4 Oct 27, 2022
bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。

bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。 整体设计 Quick start 1. 安装依赖 2. 项目初始

腾讯蓝鲸 96 Dec 15, 2022
a simple functional programming language compiler written in python

Functional Programming Language A compiler for my small functional language. Written in python with SLY lexer/parser generator library. Requirements p

Ashkan Laei 3 Nov 05, 2021