RasgoQL

Write python locally, execute SQL in your data warehouse
≪ Read the Docs · Join Our Slack »

RasgoQL is a Python package that enables you to easily query and transform tables in your Data Warehouse directly from a notebook.

You can quickly create new features, sample data, apply complex aggregates... all without having to write SQL!

Choose from our library of predefined transformations or make your own to streamline the feature engineering process.

Why is this package useful?

Data scientists spend much of their time in pandas preparing data for modelling. When they are ready to deploy or scale, two pain points arise:

pandas cannot handle larger volumes of data, forcing the use of VMs or code refactoring.
feature data must be added to the Enterprise Data Warehouse for future processing, requiring refactoring to SQL

We created RasgoQL to solve these two pain points.

Learn more at https://docs.rasgoql.com.

How does it work?

Under the covers, RasgoQL sends all processing to your Data Warehouse, enabling the efficient transformation of massive datasets. RasgoQL only needs basic metadata to execute transforms, so your private data remains secure.

RasgoQL does these things well:

Pulls existing Data Warehouse tables into pandas DataFrames for analysis
Constructs SQL queries using a syntax that feels like pandas
Creates views in your Data Warehouse to save transformed data
Exports runnable sql in .sql files or dbt-compliant .yaml files
Offers dozens of free SQL transforms to use
Coming Soon: allows users to create & add custom transforms

Rasgo supports Snowflake, BigQuery, Postgres, and Amazon Redshift with more Data Warehouses being added soon. If you'd like to suggest another database type, submit your idea to our GitHub Discussions page so that other community members can weight in and show their support.

Can RasgoQL help you?

If you use pandas to build features, but you are working on a massive set of data that won't fit in your machine's memory. RasgoQL can help!
If your organization uses dbt or another SQL tool to run production data flows, but you prefer to build features in pandas. RasgoQL can help!
If you know pandas, but not SQL and want to learn how queries will translate. RasgoQL can help!

Where to get it

Just run a simple pip install.

pip install rasgoql~=1.0

Report Bug · Suggest Improvement · Request Feature

Quick Start

pip install rasgoql --upgrade

# Connect to your data warehouse
creds = rasgoql.SnowflakeCredentials(
    account="",
    user="",
    password="",
    role="",
    warehouse="",
    database="",
    schema=""
)

# Connect to DW
rql = rasgoql.connect(creds)

# List available tables
rql.list_tables('ADVENTUREWORKS').head(10)

# Allow rasgoQL to interact with an existing Table in your Data Warehouse
dataset = rql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES')

# Take a peek at the data
dataset.preview()

# Use the datetrunc transform to seperate things into weeks
weekly_sales = dataset.datetrunc(dates={'ORDERDATE':'week'})

# Aggregate to sum of sales for each week
agg_weekly_sales = weekly_sales.aggregate(
    group_by=['PRODUCTKEY', 'ORDERDATE_WEEK'],
    aggregations={'SALESAMOUNT': ['SUM']},
    )

# Quickly validate output
agg_weekly_sales.to_df()

# Print the SQL
print(agg_weekly_sales.sql())

Getting Stared Tutorials

The best way to get familiar with the RasgoQL basics is by running through these notebooks in the tutorials folder.

Advanced Examples

Joins

Easily join tables together using the join transform.

sales_dataset = rasgoql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES')

sales_product_dataset = sales_dataset.join(
  join_table='DIM_PRODUCT',
  join_columns={'PRODUCTKEY': 'PRODUCTKEY'},
  join_type='LEFT',
  join_prefix='PRODUCT')

sales_product_dataset.sql()
sales_product_dataset.preview()

Chain transforms together

Create a rolling average aggregation and then drops unnecessary colomns.

sales_agg_drop = sales_dataset.rolling_agg(
    aggregations={"SALESAMOUNT": ["MAX", "MIN", "SUM"]},
    order_by="ORDERDATE",
    offsets=[-7, 7],
    group_by=["PRODUCTKEY"],
).drop_columns(exclude_cols=["ORDERDATEKEY"])

sales_agg_drop.sql()
sales_agg_drop.preview()

Transpose unique values with pivots

Quickly generate pivot tables of your data.

sales_by_product = sales_dataset.pivot(
    dimensions=['ORDERDATE'],
    pivot_column='SALESAMOUNT',
    value_column='PRODUCTKEY',
    agg_method='SUM',
    list_of_vals=['310', '345'],
)

sales_by_product.sql()
sales_by_product.preview()

Does any of my data get collected?

Rasgo will not collect any personal information. We log execution of methods in transforms.py for success and failure so that we can more accurately track what's useful and what's problematic.

Where do I go for help?

If you have any questions please:

How can I contribute?

Review the contributors guide

License

RasgoQL uses the GNU AGPL license, as found in the LICENSE file.

This project is sponspored by RasgoML. Find out at https://www.rasgoml.com/

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github		.github
dbt/rasgoql		dbt/rasgoql
rasgoql		rasgoql
tutorials		tutorials
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

rasgointelligence/RasgoQL

Folders and files

Latest commit

History

Repository files navigation

RasgoQL

Why is this package useful?

How does it work?

Can RasgoQL help you?

Where to get it

Quick Start

Getting Stared Tutorials

Advanced Examples

Joins

Chain transforms together

Transpose unique values with pivots

Does any of my data get collected?

Where do I go for help?

How can I contribute?

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages