Anomaly detection on SQL data warehouses and databases

Last update: Dec 18, 2022

Overview

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases.

Getting Started

Install via Docker

docker run -p 3000:80 cuebook/cueobserve

Now visit http://localhost:3000 in your browser.

How it works

You write a SQL GROUP BY query, map its columns as dimensions and measures, and save it as a virtual Dataset.

You then define one or more anomaly detection jobs on the dataset.

When an anomaly detection job runs, CueObserve does the following:

Executes the SQL GROUP BY query on your data warehouse and stores the result as a Pandas dataframe.
Generates one or more timeseries from the dataframe, as defined in your anomaly detection job.
Generates a forecast for each timeseries using Prophet.
Creates a visual card for each timeseries. Marks the card as an anomaly if the last data point is anomalous.

Features

Automated SQL to timeseries transformation.
Run anomaly detection on the aggregate metric or break it down by any dimension.
In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
Slack alerts when anomalies are detected. (coming soon)
Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs. (coming soon)

Limitations

Currently supports Prophet for timeseries forecasting.
Not being built for real-time anomaly detection on streaming data.

Support

For general help using CueObserve, read the documentation, or go to Github Discussions.

To report a bug or request a feature, open an issue.

Contributing

We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an issue or a discussion. Contributors are expected to adhere to our code of conduct.

Comments

hourly error

Discussed in https://github.com/cuebook/CueObserve/discussions/75

^{Originally posted by jithendra945 August 5, 2021} Im getting this error, when i am trying to run anomaly definition.

Im not getting why it is having None in it.
bug observe

opened by sachinkbansal 8
Clickhouse as datasource support

Describe the solution you'd like I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like Add ClickHouse as a supported data source.

I will wait when https://github.com/cuebook/CueObserve/issues/52 will resolve and try to implements PR
enhancement connection

opened by Slach 4
Support SQL Server Data Source

Is your feature request related to a problem? Please describe. I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like Add SQL Server as a supported data source.

Additional context I'd be interested in making the necessary pull request, but I'd like some high level advice on what might be needed.

Is it as simple as adding the necessary sqlserver.py in https://github.com/cuebook/CueObserve/tree/main/api/dbConnections ?
enhancement connection

opened by adam133 4
Schedule Tasks more that or equal 6 are being struck lifetime in celery queue
Describe the bug Cueobserve scheduled tasks are getting being struck in celery queue and not completing(No error is thrown)

To Reproduce Steps to reproduce the behavior:

create 6 different anomalies definitions

create 5 min cron interval & wait for 5min to start the scheduled tasks

th0se tasks are not completing even after 1 day.

for debugging, I have tried executing 5 scheduled tasks and 1 Scheduled task separately(different cron intervals), its is working fine, but when the scheduled tasks are 6 with same cron interval those got struck and didn't finish

Expected behavior It should complete the 6 accepted tasks and pull the next tasks

Thanks
bug observe
opened by itsmesrds 3
Docker Network Mode and port mapping
Is your feature request related to a problem? Please describe. The docker-compose files provided sets everything with network mode "host". For various reason this setting does not work in my environment.

Describe the solution you'd like

Capacity to use port mapping within the docker-compose files.

Capacity to use a Docker Swarm

observe
opened by adrien-ferrara 2
docker-compose start error

got the result:

/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release. from cryptography.hazmat.backends import default_backend Traceback (most recent call last): File "/usr/local/bin/docker-compose", line 7, in from compose.cli.main import main File "/usr/local/lib/python2.7/dist-packages/compose/cli/main.py", line 24, in from ..config import ConfigurationError File "/usr/local/lib/python2.7/dist-packages/compose/config/init.py", line 6, in from .config import ConfigurationError File "/usr/local/lib/python2.7/dist-packages/compose/config/config.py", line 51, in from .validation import match_named_volumes File "/usr/local/lib/python2.7/dist-packages/compose/config/validation.py", line 12, in from jsonschema import Draft4Validator File "/usr/local/lib/python2.7/dist-packages/jsonschema/init.py", line 21, in from jsonschema._types import TypeChecker File "/usr/local/lib/python2.7/dist-packages/jsonschema/_types.py", line 3, in from pyrsistent import pmap File "/usr/local/lib/python2.7/dist-packages/pyrsistent/init.py", line 3, in from pyrsistent._pmap import pmap, m, PMap File "/usr/local/lib/python2.7/dist-packages/pyrsistent/_pmap.py", line 98 ) from e ^ SyntaxError: invalid syntax

opened by wdongdongde 1
Support generic rest end point for notification
Discussed in https://github.com/cuebook/CueObserve/discussions/129

^{Originally posted by pjpringle August 30, 2021} Not everyone has slack especially in the work place environments. Provide support to plug in rest calls to notify of anomalies.

[x] Image bytes included in response json

enhancement observe
opened by sachinkbansal 1
Failing task due to incomplete data
Sometimes Anomaly detection job fails or the RCA job fails due to fewer data in a few buckets in that case we don't even show the results for the other buckets.

if not allTasksSucceeded: runStatusObj.status = ANOMALY_DETECTION_ERROR

We have added such checks to the code. But I think this is not a good thing to do as the user is not aware of the reason for the failure. Better is to skip this bucket and show the result of the rest of the buckets.
opened by AakarSharmaHME 3
Different Priority alerts on different slack channel

Is your feature request related to a problem? Please describe. There are different kinds of datasets. Anomalies on some are a P0 alert and on some, they are not that urgent. On the same dataset also a certain threshold needs immediate attention while others do not that much.

Describe the solution you'd like While configuring settings for the slack channel we should be able to configure the different priorities and their slack channel id. Then while creating anomaly definition we should mention the priority of the anomaly definition. Just like we do for schedules.

opened by AakarSharmaHME 5
Docker Image Build fails on Dev compose
Describe the bug Failed to create a docker image using docker-compose -f docker-compose-dev.yml up -d . Build fails with error Service 'cueo-backend' failed to build : Build failed

To Reproduce Steps to reproduce the behavior:

Git clone this repo

Run docker-compose -f docker-compose-dev.yml up -d

Expected behavior CueO components image to be built successfully and CueObserve to start running on localhost:3000

Additional context Will add the entire log in the file or snippet below.
observe
opened by shreyas-dev 4
pyarrow not installed

Discussed in https://github.com/cuebook/CueObserve/discussions/187

^{Originally posted by ravi-jam October 16, 2021} Hi Folks,

My team just deployed CueObserve to EC2. I Got this error when we tried to add a dataset

{“log”:“2021-10-16 06:15:14,232 [dbConnections.bigquery:63] ERROR Can’t connect to db with this credentials The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.\n”,“stream”:“stderr”,“time”:“2021-10-16T06:15:14.237031452Z”}

Any help would be appreciated.

bug

opened by sachinkbansal 0
Handle scenario where dataset's last data point is complete and must not be ignored
Discussed in https://github.com/cuebook/CueObserve/discussions/172

^{Originally posted by PhilippeDo October 8, 2021} I have now this error when I tried to use prophet

{"d2b9be2a-6d27-4e61-a577-3d99892e94e0": {"dimVal": null, "error": "{\"message\": \"single positional indexer is out-of-bounds\", \"stackTrace\": \"Traceback (most recent call last):\\n File \\\"/code/ops/tasks/anomalyDetection.py\\\", line 56, in anomalyService\\n result = detect(df, granularity, detectionRuleType, anomalyDef)\\n File \\\"/code/ops/tasks/anomalyDetection.py\\\", line 29, in detect\\n return prophetDetect(df, granularity)\\n File \\\"/code/ops/tasks/detectionTypes/prophet.py\\\", line 47, in prophetDetect\\n lastISO = df.iloc[-1][\\\"ds\\\"]\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 895, in __getitem__\\n return self._getitem_axis(maybe_callable, axis=axis)\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 1501, in _getitem_axis\\n self._validate_integer(key, axis)\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 1444, in _validate_integer\\n raise IndexError(\\\"single positional indexer is out-of-bounds\\\")\\nIndexError: single positional indexer is out-of-bounds\\n\"}", "success": false}}

I used the following in DATASET section

select DATE_FORMAT(date, '%Y-%m-%d %H') as Date, pagelt from pageloadtime2

and then when I run, the dataset looks liks this. It is a minimal dataset withou dimension, with only a measure and a timestamp

enhancement anomalyDefinition
opened by sachinkbansal 0

Releases(v0.3.2)

v0.3.2(Feb 23, 2022)
Kubernetes Deployment using helm charts

AutoScaling application supported (worker nodes)

Source code(tar.gz)
Source code(zip)
v0.3.1(Jan 4, 2022)

Updated Documentation
Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 13, 2021)

Contains documentation for release-v0.2.
Source code(tar.gz)
Source code(zip)
v0.3(Oct 13, 2021)
Features:

Scaling Anomaly Detection using Lambda (optional)

Development using docker-compose

Generic endpoint support for alerts

Source code(tar.gz)
Source code(zip)
v0.2(Aug 17, 2021)

Root Cause Analysis #46 Authentication #64 Mathematical rules for anomaly detection, in addition to Prophet #45 SQL Server support #52 Slack alerts #29

Thanks @adam133 and @jithendra945 for your contribution.
Source code(tar.gz)
Source code(zip)
v0.1(Jul 21, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Cuebook

GitHub Repository https://cueobserve.cuebook.ai

A Python Object-Document-Mapper for working with MongoDB

MongoEngine Info: MongoEngine is an ORM-like layer on top of PyMongo. Repository: https://github.com/MongoEngine/mongoengine Author: Harry Marr (http:

3.9k Jan 08, 2023

ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

957 Dec 30, 2022

A CRUD and REST api with mongodb atlas.

Movies_api A CRUD and REST api with mongodb atlas. Setup First import all the python dependencies in your virtual environment or globally by the follo

0 Nov 09, 2022

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義（Excelファイル）をインポートした後、データをsqlit

1 Jan 09, 2022

A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

12 Nov 23, 2022

An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

19 Dec 15, 2022

MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

2.2k Dec 25, 2022

asyncio (PEP 3156) Redis support

aioredis asyncio (PEP 3156) Redis client library. Features hiredis parser Yes Pure-python parser Yes Low-level & High-level APIs Yes Connections Pool

2.2k Jan 04, 2023

Async ODM (Object Document Mapper) for MongoDB based on python type hints

ODMantic Documentation: https://art049.github.io/odmantic/ Asynchronous ODM(Object Document Mapper) for MongoDB based on standard python type hints. I

732 Dec 31, 2022

Asynchronous interface for peewee ORM powered by asyncio

peewee-async Asynchronous interface for peewee ORM powered by asyncio. Important notes Since version 0.6.0a only peewee 3.5+ is supported If you still

666 Dec 30, 2022

edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can giv

8 Dec 12, 2022

Class to connect to XAMPP MySQL Database

MySQL-DB-Connection-Class Class to connect to XAMPP MySQL Database Basta fazer o download o mysql_connect.py e modificar os parâmetros que quiser. E d

4 Jul 12, 2021

Python PostgreSQL database performance insights. Locks, index usage, buffer cache hit ratios, vacuum stats and more.

Python PG Extras Python port of Heroku PG Extras with several additions and improvements. The goal of this project is to provide powerful insights int

35 Nov 01, 2022

Simplest SQL mapper in Python, probably

SQL MAPPER Basically what it does is: it executes some SQL thru a database connector you fed it, maps it to some model and gives to u. Also it can cre

2 Nov 07, 2022

A Pythonic, object-oriented interface for working with MongoDB.

PyMODM MongoDB has paused the development of PyMODM. If there are any users who want to take over and maintain this project, or if you just have quest

345 Dec 25, 2022

Monty, Mongo tinified. MongoDB implemented in Python !

Monty, Mongo tinified. MongoDB implemented in Python ! Inspired by TinyDB and it's extension TinyMongo. MontyDB is: A tiny version of MongoDB, against

522 Jan 01, 2023

MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

582 Dec 28, 2022

Makes it easier to write raw SQL in Python.

CoolSQL Makes it easier to write raw SQL in Python. Usage Quick Start from coolsql import Field name = Field("name") age = Field("age") condition =

7 Aug 21, 2022

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

95 Jan 05, 2023

Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

27 Dec 27, 2022