Airflow Operator for running Soda SQL scans

Overview

Soda SQL Airflow Operator

Airflow Operator for running Soda SQL scans

Example Usage

# src/soda/scans/my_scan.yml
# Note that the Airflow rendered templates are accessible (e.g. {{ params.client_id }})
table_name: tmp_{{ params.client_id }}_{{ ds_nodash }}
sql_metrics:
  - sql: |
      SELECT
        SUM(value1) AS staged_value1,
        SUM(value2) AS staged_value2
      FROM tmp_{{ params.client_id }}_{{ ds_nodash }}
  - sql: |
      SELECT
        SUM(value1) AS final_value1,
        SUM(value2) AS final_value2
      FROM final_table
      WHERE
        date = '{{ ds }}'
        AND client_id = {{ params.client_id }}
tests:
  - staged_value1 == final_value1
  - staged_value2 > final_value2
# my_airflow_dag.py
from pathlib import Path
from soda_util import build_soda_warehouse, convert_templated_yml_to_dict

SODA_PATH = Path(os.getenv("PYTHON_PATH", "/code/src")) / "/soda/scans/"  # Matches where my_scan.yml is saved

validate_staged_data = SodaSqlOperator(
    task_id="validate_staged_data",
    warehouse=build_soda_warehouse("warehouse_name", "database_name"),  # Could also pass a file path to a yml file
    scan=convert_templated_yml_to_dict(SODA_PATH, "my_scan.yml"),
    params={"client_id": 12345},  # Params are rendered by Airflow and accessible in the yaml file
)

Notes

  • Unlike Soda itself, a builder pattern is not used to define the warehouse and scan argument. Rather, the warehouse and scan parameters are instance checked and the relevant Soda methods are set. This provides a much simpler API, where we can just pass in the args to the Operator
  • As we are passing over all rendering of Jinga templates to Airflow, the native Soda templates are not accessible. So always use Airflow templates
  • Soft failures (i.e. the Airflow task doesn't fail, it just alerts) have been implemented, but alerting of soft failures has not. So soft failures will essentially just mean the Airflow task passes. Alerting to be implemented
Owner
Todd de Quincey
Data Engineer, Chartered Accountant and all round nice guy (or so I like to think). I believe that quality, simplicity and focus are the keys to success
Todd de Quincey
An early stage integration of Hotwire Turbo with Django

Note: This is not ready for production. APIs likely to change dramatically. Please drop by our Slack channel to discuss!

Hotwire for Django 352 Jan 06, 2023
Various hdas (Houdini Digital Assets)

aaTools My various assets for Houdini "ms_asset_loader" - Custom importer assets from Quixel Bridge "asset_placer" - Tool for placment sop geometry on

9 Dec 19, 2022
Python library for converting Python calculations into rendered latex.

Covert art by Joshua Hoiberg handcalcs: Python calculations in Jupyter, as though you wrote them by hand. handcalcs is a library to render Python calc

Connor Ferster 5.1k Jan 07, 2023
Get a list of the top-10 rejected libraries in your WhiteSource inventory

WhiteSource Top 10 Rejected Libraries Generate a spreadsheet listing the 10 most common libraries in your WhiteSource inventory that were rejected by

WhiteSource-PS-tools 10 Mar 23, 2022
A Google sheet which keeps track of the locations that want to visit and a price cutoff

FlightDeals Here's how the program works. First, I have a Google sheet which keeps track of the locations that I want to visit and a price cutoff. It

Lynne Munini 5 Nov 21, 2022
The update manager for the ERA App (era.sh)

ERA Update Manager This is the official update manager used in the ERA app (see era.sh) How it works Once a new version of ERA is available, the app l

Kian Shahriyari 1 Dec 29, 2021
An Advent calendar of small programming puzzles for a variety of skill sets and skill levels.

Advent of Code 2021 The Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be sol

Evan Cope 0 Feb 13, 2022
A powerful and user-friendly binary analysis platform!

angr angr is a platform-agnostic binary analysis framework. It is brought to you by the Computer Security Lab at UC Santa Barbara, SEFCOM at Arizona S

6.3k Jan 02, 2023
Your one and only Discord Bot that helps you concentrate!

Your one and only Discord Bot thats helps you concentrate! Consider leaving a ⭐ if you found the project helpful. concy-bot A bot which constructively

IEEE VIT Student Chapter 22 Sep 27, 2022
Backup dc registry - A simple POC that abuses Backup Operator privileges to remote dump SAM, SYSTEM, and SECURITY

Backup Operator Registry Backup to Domain Compromise A simple POC that abuses Ba

Horizon 3 AI Inc 57 Dec 18, 2022
Tool for running a high throughput data ingestion/transformation workload with MongoDB

Mongo Mangler The mongo-mangler tool is a lightweight Python utility, which you can run from a low-powered machine to execute a high throughput data i

Paul Done 9 Jan 02, 2023
Rick Astley Language is a rick roll oriented, dynamic, strong, esoteric programming language.

Rick Roll Language / Rick Astley Language A rick roll oriented, dynamic, strong, esoteric programming language. Prolegomenon The reasons that I made t

Rick Roll Programming Language 658 Jan 09, 2023
An implementation of Ray Tracing in One Weekend using Taichi

又一个Taichi语言的Ray Tracer 背景简介 这个Ray Tracer基本上是照搬了Peter Shirley的第一本小书Ray Tracing in One Weekend,在我写的时候参考的是Version 3.2.3这个版本。应该比其他中文博客删改了不少内容。果然Peter Shir

张皓 30 Nov 21, 2022
Open source home automation that puts local control and privacy first

Home Assistant Open source home automation that puts local control and privacy first. Powered by a worldwide community of tinkerers and DIY enthusiast

Home Assistant 57k Jan 02, 2023
Submission from Team OMR for the TRI-NIT Hackathon

Submission from Team OMR for the TRI-NIT Hackathon

0 Feb 01, 2022
A full-featured, hackable tiling window manager written and configured in Python

A full-featured, hackable tiling window manager written and configured in Python Features Simple, small and extensible. It's easy to write your own la

Qtile 3.8k Dec 31, 2022
A simply program to find active jackbox.tv game codes

PeepingJack A simply program to find active jackbox.tv game codes How does this work? It uses a threadpool to loop through all possible codes in a ran

3 Mar 20, 2022
Python meta class and abstract method library with restrictions.

abcmeta Python meta class and abstract method library with restrictions. This library provides a restricted way to validate abstract methods. The Pyth

Morteza NourelahiAlamdari 8 Dec 14, 2022
LSO, also known as Linux Swap Operator, is a software with both GUI and terminal versions that you can manage the Swap area for Linux operating systems.

LSO - Linux Swap Operator Türkçe - LSO Nedir? LSO, diğer adıyla Linux Swap Operator Linux işletim sistemleri için Swap alanını yönetebileceğiniz hem G

Eren İnce 4 Feb 09, 2022