A PyTorch implementation of Deep SAD, a deep Semi-supervised Anomaly Detection method.

Overview

Deep SAD: A Method for Deep Semi-Supervised Anomaly Detection

This repository provides a PyTorch implementation of the Deep SAD method presented in our ICLR 2020 paper ”Deep Semi-Supervised Anomaly Detection”.

Citation and Contact

You find a PDF of the Deep Semi-Supervised Anomaly Detection ICLR 2020 paper on arXiv https://arxiv.org/abs/1906.02694.

If you find our work useful, please also cite the paper:

@InProceedings{ruff2020deep,
  title     = {Deep Semi-Supervised Anomaly Detection},
  author    = {Ruff, Lukas and Vandermeulen, Robert A. and G{\"o}rnitz, Nico and Binder, Alexander and M{\"u}ller, Emmanuel and M{\"u}ller, Klaus-Robert and Kloft, Marius},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://openreview.net/forum?id=HkgH0TEYwH}
}

If you would like get in touch, just drop us an email to [email protected].

Abstract

Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have---in addition to a large set of unlabeled samples---access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.

The need for semi-supervised anomaly detection

fig1

Installation

This code is written in Python 3.7 and requires the packages listed in requirements.txt.

Clone the repository to your machine and directory of choice:

git clone https://github.com/lukasruff/Deep-SAD-PyTorch.git

To run the code, we recommend setting up a virtual environment, e.g. using virtualenv or conda:

virtualenv

# pip install virtualenv
cd <path-to-Deep-SAD-PyTorch-directory>
virtualenv myenv
source myenv/bin/activate
pip install -r requirements.txt

conda

cd <path-to-Deep-SAD-PyTorch-directory>
conda create --name myenv
source activate myenv
while read requirement; do conda install -n myenv --yes $requirement; done < requirements.txt

Running experiments

We have implemented the MNIST, Fashion-MNIST, and CIFAR-10 datasets as well as the classic anomaly detection benchmark datasets arrhythmia, cardio, satellite, satimage-2, shuttle, and thyroid from the Outlier Detection DataSets (ODDS) repository (http://odds.cs.stonybrook.edu/) as reported in the paper.

The implemented network architectures are as reported in the appendix of the paper.

Deep SAD

You can run Deep SAD experiments using the main.py script.

Here's an example on MNIST with 0 considered to be the normal class and having 1% labeled (known) training samples from anomaly class 1 with a pollution ratio of 10% of the unlabeled training data (with unknown anomalies from all anomaly classes 1-9):

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folders for experimental output
mkdir log/DeepSAD
mkdir log/DeepSAD/mnist_test

# change to source directory
cd src

# run experiment
python main.py mnist mnist_LeNet ../log/DeepSAD/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --lr 0.0001 --n_epochs 150 --lr_milestone 50 --batch_size 128 --weight_decay 0.5e-6 --pretrain True --ae_lr 0.0001 --ae_n_epochs 150 --ae_batch_size 128 --ae_weight_decay 0.5e-3 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

Have a look into main.py for all possible arguments and options.

Baselines

We also provide an implementation of the following baselines via the respective baseline_<method_name>.py scripts: OC-SVM (ocsvm), Isolation Forest (isoforest), Kernel Density Estimation (kde), kernel Semi-Supervised Anomaly Detection (ssad), and Semi-Supervised Deep Generative Model (SemiDGM).

Here's how to run SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/ssad
mkdir log/ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

The autoencoder is provided through Deep SAD pre-training using --pretrain True with main.py. To then run a hybrid approach using one of the classic methods on top of autoencoder features, simply point to the saved autoencoder model using --load_ae ../log/DeepSAD/mnist_test/model.tar and set --hybrid True.

To run hybrid SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/hybrid_ssad
mkdir log/hybrid_ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/hybrid_ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --hybrid True --load_ae ../log/DeepSAD/mnist_test/model.tar --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

License

MIT

Owner
Lukas Ruff
PhD student in the ML group at TU Berlin.
Lukas Ruff
Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Curriculum Learning by Optimizing Learning Dynamics (DoCL) AISTATS 2021 paper: Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appen

Tianyi Zhou 15 Dec 06, 2022
Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

P3 Ranker Implementation for our SIGIR2022 accepted paper: P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-bas

14 Jan 04, 2023
A curated list of awesome mathematics resources

A curated list of awesome mathematics resources

Cyrille Rossant 6.7k Jan 05, 2023
Create Python API documentation in Markdown format.

Pydoc-Markdown Pydoc-Markdown is a tool and library to create Python API documentation in Markdown format based on lib2to3, allowing it to parse your

Niklas Rosenstein 375 Jan 05, 2023
Version bêta d'un système pour suivre les prix des livres chez Books to Scrape,

Version bêta d'un système pour suivre les prix des livres chez Books to Scrape, un revendeur de livres en ligne. En pratique, dans cette version bêta, le programme n'effectuera pas une véritable surv

Mouhamed Dia 1 Jan 06, 2022
NetBox plugin that stores configuration diffs and checks templates compliance

Config Officer - NetBox plugin NetBox plugin that deals with Cisco device configuration (collects running config from Cisco devices, indicates config

77 Dec 21, 2022
MkDocs Plugin allowing your visitors to *File > Print > Save as PDF* the entire site.

mkdocs-print-site-plugin MkDocs plugin that adds a page to your site combining all pages, allowing your site visitors to File Print Save as PDF th

Tim Vink 67 Jan 04, 2023
This is a small project written to help build documentation for projects in less time.

Documentation-Builder This is a small project written to help build documentation for projects in less time. About This project builds documentation f

Tom Jebbo 2 Jan 17, 2022
Feature Store for Machine Learning

Overview Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and

Feast 3.8k Dec 30, 2022
Course materials for: Geospatial Data Science

Course materials for: Geospatial Data Science These course materials cover the lectures for the course held for the first time in spring 2022 at IT Un

Michael Szell 266 Jan 02, 2023
charcade is a string manipulation library that can animate, color, and bruteforce strings

charcade charcade is a string manipulation library that can animate, color, and bruteforce strings. Features Animating text for CLI applications with

Aaron 8 May 23, 2022
Resource hub for Obsidian resources.

Obsidian Community Vault Welcome! This is an experimental vault that is maintained by the Obsidian community. For best results we recommend downloadin

Obsidian Community 320 Jan 02, 2023
300+ Python Interview Questions

300+ Python Interview Questions

Pradeep Kumar 1.1k Jan 02, 2023
Some custom tweaks to the results produced by pytkdocs.

pytkdocs_tweaks Some custom tweaks for pytkdocs. For use as part of the documentation-generation-for-Python stack that comprises mkdocs, mkdocs-materi

Patrick Kidger 4 Nov 24, 2022
Numpy's Sphinx extensions

numpydoc -- Numpy's Sphinx extensions This package provides the numpydoc Sphinx extension for handling docstrings formatted according to the NumPy doc

NumPy 234 Dec 26, 2022
Demonstration that AWS IAM policy evaluation docs are incorrect

The flowchart from the AWS IAM policy evaluation documentation page, as of 2021-09-12, and dating back to at least 2018-12-27, is the following: The f

Ben Kehoe 15 Oct 21, 2022
A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

PYTHON WORLD 12.1k Jan 05, 2023
LotteryBuyPredictionWebApp - Lottery Purchase Prediction Model

Lottery Purchase Prediction Model Objective and Goal Predict the lottery type th

Wanxuan Zhang 2 Feb 14, 2022
Convenient tools for using Swagger to define and validate your interfaces in a Pyramid webapp.

Convenient tools for using Swagger to define and validate your interfaces in a Pyramid webapp.

Scott Triglia 64 Sep 18, 2022
Automated Integration Testing and Live Documentation for your API

Automated Integration Testing and Live Documentation for your API

ScanAPI 1.3k Dec 30, 2022