Anomaly detection on SQL data warehouses and databases

Overview

CueObserve Logo

Test Coverage License


With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases.

CueObserve

Getting Started

Install via Docker

docker run -p 3000:80 cuebook/cueobserve

Now visit http://localhost:3000 in your browser.

How it works

You write a SQL GROUP BY query, map its columns as dimensions and measures, and save it as a virtual Dataset.

Dataset SQL

Dataset Schema Map

You then define one or more anomaly detection jobs on the dataset.

Anomaly Definition

When an anomaly detection job runs, CueObserve does the following:

  1. Executes the SQL GROUP BY query on your data warehouse and stores the result as a Pandas dataframe.
  2. Generates one or more timeseries from the dataframe, as defined in your anomaly detection job.
  3. Generates a forecast for each timeseries using Prophet.
  4. Creates a visual card for each timeseries. Marks the card as an anomaly if the last data point is anomalous.

Features

  • Automated SQL to timeseries transformation.
  • Run anomaly detection on the aggregate metric or break it down by any dimension.
  • In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
  • Slack alerts when anomalies are detected. (coming soon)
  • Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs. (coming soon)

Limitations

  • Currently supports Prophet for timeseries forecasting.
  • Not being built for real-time anomaly detection on streaming data.

Support

For general help using CueObserve, read the documentation, or go to Github Discussions.

To report a bug or request a feature, open an issue.

Contributing

We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an issue or a discussion. Contributors are expected to adhere to our code of conduct.

Comments
  • hourly error

    hourly error

    Discussed in https://github.com/cuebook/CueObserve/discussions/75

    Originally posted by jithendra945 August 5, 2021 image Im getting this error, when i am trying to run anomaly definition.

    Im not getting why it is having None in it.

    bug observe 
    opened by sachinkbansal 8
  • Clickhouse as datasource support

    Clickhouse as datasource support

    Describe the solution you'd like I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

    Describe the solution you'd like Add ClickHouse as a supported data source.

    I will wait when https://github.com/cuebook/CueObserve/issues/52 will resolve and try to implements PR

    enhancement connection 
    opened by Slach 4
  • Support SQL Server Data Source

    Support SQL Server Data Source

    Is your feature request related to a problem? Please describe. I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

    Describe the solution you'd like Add SQL Server as a supported data source.

    Additional context I'd be interested in making the necessary pull request, but I'd like some high level advice on what might be needed.

    Is it as simple as adding the necessary sqlserver.py in https://github.com/cuebook/CueObserve/tree/main/api/dbConnections ?

    enhancement connection 
    opened by adam133 4
  • Schedule Tasks more that or equal 6 are being struck lifetime in celery queue

    Schedule Tasks more that or equal 6 are being struck lifetime in celery queue

    Describe the bug Cueobserve scheduled tasks are getting being struck in celery queue and not completing(No error is thrown)

    image

    To Reproduce Steps to reproduce the behavior:

    1. create 6 different anomalies definitions
    2. create 5 min cron interval & wait for 5min to start the scheduled tasks

    th0se tasks are not completing even after 1 day.

    for debugging, I have tried executing 5 scheduled tasks and 1 Scheduled task separately(different cron intervals), its is working fine, but when the scheduled tasks are 6 with same cron interval those got struck and didn't finish

    Expected behavior It should complete the 6 accepted tasks and pull the next tasks

    Thanks

    bug observe 
    opened by itsmesrds 3
  • Docker Network Mode and port mapping

    Docker Network Mode and port mapping

    Is your feature request related to a problem? Please describe. The docker-compose files provided sets everything with network mode "host". For various reason this setting does not work in my environment.

    Describe the solution you'd like

    • Capacity to use port mapping within the docker-compose files.
    • Capacity to use a Docker Swarm
    observe 
    opened by adrien-ferrara 2
  • docker-compose start error

    docker-compose start error

    image

    got the result:

    /usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release. from cryptography.hazmat.backends import default_backend Traceback (most recent call last): File "/usr/local/bin/docker-compose", line 7, in from compose.cli.main import main File "/usr/local/lib/python2.7/dist-packages/compose/cli/main.py", line 24, in from ..config import ConfigurationError File "/usr/local/lib/python2.7/dist-packages/compose/config/init.py", line 6, in from .config import ConfigurationError File "/usr/local/lib/python2.7/dist-packages/compose/config/config.py", line 51, in from .validation import match_named_volumes File "/usr/local/lib/python2.7/dist-packages/compose/config/validation.py", line 12, in from jsonschema import Draft4Validator File "/usr/local/lib/python2.7/dist-packages/jsonschema/init.py", line 21, in from jsonschema._types import TypeChecker File "/usr/local/lib/python2.7/dist-packages/jsonschema/_types.py", line 3, in from pyrsistent import pmap File "/usr/local/lib/python2.7/dist-packages/pyrsistent/init.py", line 3, in from pyrsistent._pmap import pmap, m, PMap File "/usr/local/lib/python2.7/dist-packages/pyrsistent/_pmap.py", line 98 ) from e ^ SyntaxError: invalid syntax

    opened by wdongdongde 1
  • Support generic rest end point for notification

    Support generic rest end point for notification

    Discussed in https://github.com/cuebook/CueObserve/discussions/129

    Originally posted by pjpringle August 30, 2021 Not everyone has slack especially in the work place environments. Provide support to plug in rest calls to notify of anomalies.

    • [x] Image bytes included in response json
    enhancement observe 
    opened by sachinkbansal 1
  • Failing task due to incomplete data

    Failing task due to incomplete data

    Sometimes Anomaly detection job fails or the RCA job fails due to fewer data in a few buckets in that case we don't even show the results for the other buckets.

        if not allTasksSucceeded:
            runStatusObj.status = ANOMALY_DETECTION_ERROR
    

    We have added such checks to the code. But I think this is not a good thing to do as the user is not aware of the reason for the failure. Better is to skip this bucket and show the result of the rest of the buckets.

    opened by AakarSharmaHME 3
  • Different Priority alerts on different slack channel

    Different Priority alerts on different slack channel

    Is your feature request related to a problem? Please describe. There are different kinds of datasets. Anomalies on some are a P0 alert and on some, they are not that urgent. On the same dataset also a certain threshold needs immediate attention while others do not that much.

    Describe the solution you'd like While configuring settings for the slack channel we should be able to configure the different priorities and their slack channel id. Then while creating anomaly definition we should mention the priority of the anomaly definition. Just like we do for schedules.

    opened by AakarSharmaHME 5
  • Docker Image Build fails on Dev compose

    Docker Image Build fails on Dev compose

    Describe the bug Failed to create a docker image using docker-compose -f docker-compose-dev.yml up -d . Build fails with error Service 'cueo-backend' failed to build : Build failed

    To Reproduce Steps to reproduce the behavior:

    1. Git clone this repo
    2. Run docker-compose -f docker-compose-dev.yml up -d

    Expected behavior CueO components image to be built successfully and CueObserve to start running on localhost:3000

    Additional context Will add the entire log in the file or snippet below.

    observe 
    opened by shreyas-dev 4
  • pyarrow not installed

    pyarrow not installed

    Discussed in https://github.com/cuebook/CueObserve/discussions/187

    Originally posted by ravi-jam October 16, 2021 Hi Folks,

    My team just deployed CueObserve to EC2. I Got this error when we tried to add a dataset

    {“log”:“2021-10-16 06:15:14,232 [dbConnections.bigquery:63] ERROR Can’t connect to db with this credentials The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.\n”,“stream”:“stderr”,“time”:“2021-10-16T06:15:14.237031452Z”}

    Any help would be appreciated.

    bug 
    opened by sachinkbansal 0
  • Handle scenario where dataset's last data point is complete and must not be ignored

    Handle scenario where dataset's last data point is complete and must not be ignored

    Discussed in https://github.com/cuebook/CueObserve/discussions/172

    Originally posted by PhilippeDo October 8, 2021 I have now this error when I tried to use prophet

    {"d2b9be2a-6d27-4e61-a577-3d99892e94e0": {"dimVal": null, "error": "{\"message\": \"single positional indexer is out-of-bounds\", \"stackTrace\": \"Traceback (most recent call last):\\n File \\\"/code/ops/tasks/anomalyDetection.py\\\", line 56, in anomalyService\\n result = detect(df, granularity, detectionRuleType, anomalyDef)\\n File \\\"/code/ops/tasks/anomalyDetection.py\\\", line 29, in detect\\n return prophetDetect(df, granularity)\\n File \\\"/code/ops/tasks/detectionTypes/prophet.py\\\", line 47, in prophetDetect\\n lastISO = df.iloc[-1][\\\"ds\\\"]\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 895, in __getitem__\\n return self._getitem_axis(maybe_callable, axis=axis)\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 1501, in _getitem_axis\\n self._validate_integer(key, axis)\\n File \\\"/opt/venv/lib/python3.7/site-packages/pandas/core/indexing.py\\\", line 1444, in _validate_integer\\n raise IndexError(\\\"single positional indexer is out-of-bounds\\\")\\nIndexError: single positional indexer is out-of-bounds\\n\"}", "success": false}}

    I used the following in DATASET section

    select DATE_FORMAT(date, '%Y-%m-%d %H') as Date,
    pagelt from pageloadtime2
    

    and then when I run, the dataset looks liks this. It is a minimal dataset withou dimension, with only a measure and a timestamp

    image

    enhancement anomalyDefinition 
    opened by sachinkbansal 0
Releases(v0.3.2)
aiosql - Simple SQL in Python

aiosql - Simple SQL in Python SQL is code. Write it, version control it, comment it, and run it using files. Writing your SQL code in Python programs

Will Vaughn 1.1k Jan 08, 2023
Google Sheets Python API v4

pygsheets - Google Spreadsheets Python API v4 A simple, intuitive library for google sheets which gets your work done. Features: Open, create, delete

Nithin Murali 1.4k Dec 31, 2022
Toolkit for storing files and attachments in web applications

DEPOT - File Storage Made Easy DEPOT is a framework for easily storing and serving files in web applications on Python2.6+ and Python3.2+. DEPOT suppo

Alessandro Molina 139 Dec 25, 2022
TileDB-Py is a Python interface to the TileDB Storage Engine.

TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re

TileDB, Inc. 149 Nov 28, 2022
Python version of the TerminusDB client - for TerminusDB API and WOQLpy

TerminusDB Client Python Development status ⚙️ Python Package status 📦 Python version of the TerminusDB client - for TerminusDB API and WOQLpy Requir

TerminusDB 66 Dec 02, 2022
Python ODBC bridge

pyodbc pyodbc is an open source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0 specification but is packed wit

Michael Kleehammer 2.6k Dec 27, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
SAP HANA Connector in pure Python

SAP HANA Database Client for Python Important Notice This public repository is read-only and no longer maintained. The active maintained alternative i

SAP Archive 299 Nov 20, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023
A wrapper around asyncpg for use with sqlalchemy

asyncpgsa A python library wrapper around asyncpg for use with sqlalchemy Backwards incompatibility notice Since this library is still in pre 1.0 worl

Canopy 404 Dec 03, 2022
A simple Python tool to transfer data from MySQL to SQLite 3.

MySQL to SQLite3 A simple Python tool to transfer data from MySQL to SQLite 3. This is the long overdue complimentary tool to my SQLite3 to MySQL. It

Klemen Tusar 126 Jan 03, 2023
asyncio (PEP 3156) Redis support

aioredis asyncio (PEP 3156) Redis client library. Features hiredis parser Yes Pure-python parser Yes Low-level & High-level APIs Yes Connections Pool

aio-libs 2.2k Jan 04, 2023
Generate database table diagram from SQL data definition.

sql2diagram Generate database table diagram from SQL data definition. e.g. "CREATE TABLE ..." See Example below How does it works? Analyze the SQL to

django-cas-ng 1 Feb 08, 2022
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The library supports b

Nigel Small 1.2k Jan 02, 2023
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
Some scripts for microsoft SQL server in old version.

MSSQL_Stuff Some scripts for microsoft SQL server which is in old version. Table of content Overview Usage References Overview These script works when

小离 5 Dec 29, 2022
Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python.

MongoDB with Python Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MongoDB database hosted

MousamSingh 4 Dec 01, 2021
Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

cnpj-mysql Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL. Dados públicos de cnpj no site da Receit

17 Dec 25, 2022
Pystackql - Python wrapper for StackQL

pystackql - Python Library for StackQL Python wrapper for StackQL Usage from pys

StackQL Studios 6 Jul 01, 2022