(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Last update: Jan 05, 2023

Overview

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Background: Outlier detection (OD) is a key data mining task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.

To scale outlier detection (OD) to large-scale, high-dimensional datasets, we propose TOD, a novel system that abstracts OD algorithms into basic tensor operations for efficient GPU acceleration.

The corresponding paper. The code is being cleaned up and released. Please watch and star!

One reason to use it:

On average, TOD is 11 times faster than PyOD!

If you need another reason: it can handle much larger datasets:more than a million sample OD within an hour!

TOD is featured for:

Unified APIs, detailed documentation, and examples for the easy use (under construction)
Supports more than 10 different OD algorithms and more are being added
TOD supports multi-GPU acceleration
Advanced techniques like provable quantization

Programming Model Interface

Complex OD algorithms can be abstracted into common tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png

For instance, ABOD and COPOD can be assembled by the basic tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png

End-to-end Performance Comparison with PyOD

Overall, it is much (on avg. 11 times) faster than PyOD takes way less run time.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png

Code is being released. Watch and star for the latest news!

Comments

Error while installing package
I installed Pytorch 1.10 from their site. It seen in virtual environment. I try pip install pytod but when searching for pytorch, it cannot find it because it searches with the "pytorch" package, not the "torch" package.

ERROR: Could not find a version that satisfies the requirement pytorch>=1.7 (from pytod) (from versions: 0.1.2, 1.0.2) ERROR: No matching distribution found for pytorch>=1.7
opened by nuriakiin 1
decision_function() returns None

Thanks for the package. When I try to implement LOF (or KNN) decision_function() on test data returns empty object. Is there a fix to this? Following is the code that replicates the issue (on GPU):

from pytod.models.lof import LOF import torch import numpy as np

x = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [75,80]], dtype=np.float32) x = torch.from_numpy(x)

y = np.array([[6, 5], [1, 2], [3, 4], [5, 1], [11,12]], dtype=np.float32) y = torch.from_numpy(y)

lof = LOF(n_neighbors=2, device = 'cuda:0')

lof.fit(x)

print(lof.decision_function(y))

opened by sugatc 0
Support for novelty detection and changing distance metric with local outlier factor

The current implementation of LOF doesn't allow changing the distance metric to 'cosine', for example or setting novelty = True which prevents it from being used for novelty detection task. It will be great if support can be added for these.

opened by sugatc 2
can't fit model in colab

when i try fit on any model in colab gpu instance i get the following error. my dataset has 2 columns and 1 million rows:

AttributeError Traceback (most recent call last) in () 4 clf_name = 'KNN' 5 clf = LOF() ----> 6 clf.fit(X)

3 frames /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'to'

opened by yairVanti 0
clean up reproducibility scripts

We are cleaning up these scripts for an easy run, while the primary results are reproducible with the compare_real_data.py (https://github.com/yzhao062/pytod/tree/main/reproducibility)
enhancement

opened by yzhao062 0

Releases(v0.0.2)

v0.0.2(Jun 19, 2022)

v<0.0.1>, <04/12/2021> -- Add LOF. v<0.0.1>, <04/23/2021> -- Add ABOD. v<0.0.2>, <06/19/2021> -- Add PCA and HBOS. v<0.0.2>, <06/19/2021> -- Turn on test suites.

Now we have updated both the paper the repo to cover more algorithms.
Source code(tar.gz)
Source code(zip)

Owner

Yue Zhao

Ph.D. Student @ CMU. Outlier Detection Systems | ML Systems (MLSys) | Anomaly/Outlier Detection | AutoML. Twitter@ yzhao062

GitHub Repository https://www.andrew.cmu.edu/user/yuezhao2/papers/21-preprint-tod.pdf

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Polypharmacy - DDI - Synergy Survey The Survey Paper This repository accompanies our survey paper A Unified View of Relational Deep Learning for Polyp

79 Jan 05, 2023

Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).

Knowledge Informed Machine Learning using a Weibull-based Loss Function Exploring the concept of knowledge-informed machine learning with the use of a

43 Dec 14, 2022

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Related tags

Overview

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

One reason to use it:

Programming Model Interface

End-to-end Performance Comparison with PyOD

Comments

Error while installing package

decision_function() returns None

Support for novelty detection and changing distance metric with local outlier factor

can't fit model in colab

clean up reproducibility scripts

Releases(v0.0.2)

v0.0.2(Jun 19, 2022)

Owner

Yue Zhao

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

A booklet on machine learning systems design with exercises

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Bringing Characters to Life with Computer Brains in Unity

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Open-source implementation of Google Vizier for hyper parameters tuning

A real-time speech emotion recognition application using Scikit-learn and gradio

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

Code for "Layered Neural Rendering for Retiming People in Video."

Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

ToFFi - Toolbox for Frequency-based Fingerprinting of Brain Signals

Automatically erase objects in the video, such as logo, text, etc.

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).