MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

Overview

MemStream

Implementation of

MemStream detects anomalies from a multi-aspect data stream. We output an anomaly score for each record. MemStream is a memory augmented feature extractor, allows for quick retraining, gives a theoretical bound on the memory size for effective drift handling, is robust to memory poisoning, and outperforms 11 state-of-the-art streaming anomaly detection baselines.

After an initial training of the feature extractor on a small subset of normal data, MemStream processes records in two steps: (i) It outputs anomaly scores for each record by querying the memory for K-nearest neighbours to the record encoding and calculating a discounted distance and (ii) It updates the memory, in a FIFO manner, if the anomaly score is within an update threshold β.

Demo

  1. KDDCUP99: Run python3 memstream.py --dataset KDD --beta 1 --memlen 256
  2. NSL-KDD: Run python3 memstream.py --dataset NSL --beta 0.1 --memlen 2048
  3. UNSW-NB 15: Run python3 memstream.py --dataset UNSW --beta 0.1 --memlen 2048
  4. CICIDS-DoS: Run python3 memstream.py --dataset DOS --beta 0.1 --memlen 2048
  5. SYN: Run python3 memstream-syn.py --dataset SYN --beta 1 --memlen 16
  6. Ionosphere: Run python3 memstream.py --dataset ionosphere --beta 0.001 --memlen 4
  7. Cardiotocography: Run python3 memstream.py --dataset cardio --beta 1 --memlen 64
  8. Statlog Landsat Satellite: Run python3 memstream.py --dataset statlog --beta 0.01 --memlen 32
  9. Satimage-2: Run python3 memstream.py --dataset satimage-2 --beta 10 --memlen 256
  10. Mammography: Run python3 memstream.py --dataset mammography --beta 0.1 --memlen 128
  11. Pima Indians Diabetes: Run python3 memstream.py --dataset pima --beta 0.001 --memlen 64
  12. Covertype: Run python3 memstream.py --dataset cover --beta 0.0001 --memlen 2048

Command line options

  • --dataset: The dataset to be used for training. Choices 'NSL', 'KDD', 'UNSW', 'DOS'. (default 'NSL')
  • --beta: The threshold beta to be used. (default: 0.1)
  • --memlen: The size of the Memory Module (default: 2048)
  • --dev: Pytorch device to be used for training like "cpu", "cuda:0" etc. (default: 'cuda:0')
  • --lr: Learning rate (default: 0.01)
  • --epochs: Number of epochs (default: 5000)

Input file format

MemStream expects the input multi-aspect record stream to be stored in a contains , separated file.

Datasets

Processed Datasets can be downloaded from here. Please unzip and place the files in the data folder of the repository.

  1. KDDCUP99
  2. NSL-KDD
  3. UNSW-NB 15
  4. CICIDS-DoS
  5. Synthetic Dataset (Introduced in paper)
  6. Ionosphere
  7. Cardiotocography
  8. Statlog Landsat Satellite
  9. Satimage-2
  10. Mammography
  11. Pima Indians Diabetes
  12. Covertype

Environment

This code has been tested on Debian GNU/Linux 9 with a 12GB Nvidia GeForce RTX 2080 Ti GPU, CUDA Version 10.2 and PyTorch 1.5.

Owner
Stream-AD
Streaming Anomaly Detection
Stream-AD
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 06, 2022
This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Code Repository for the Paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (To appear in: Proceedings of NeurIPS20

1 Oct 03, 2022
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Self-labelling via simultaneous clustering and representation learning 🆗 🆗 🎉 NEW models (20th August 2020): Added standard SeLa pretrained torchvis

Yuki M. Asano 469 Jan 02, 2023
A package, and script, to perform imaging transcriptomics on a neuroimaging scan.

Imaging Transcriptomics Imaging transcriptomics is a methodology that allows to identify patterns of correlation between gene expression and some prop

Alessio Giacomel 10 Dec 27, 2022
Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

Yi-Lun Liao 62 Oct 25, 2022
Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Overview of The Code BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work Base Colab: it contains the base colab used to perform all

Simone Papicchio 4 Jul 16, 2022
Async API for controlling Hue Lights

Hue API Async API for controlling Hue Lights Documentation: hue-api.nirantak.com Source: github.com/nirantak/hue-api Installation This is an async cli

Nirantak Raghav 4 Nov 16, 2022
Learning to Prompt for Continual Learning

Learning to Prompt for Continual Learning (L2P) Official Jax Implementation L2P is a novel continual learning technique which learns to dynamically pr

Google Research 207 Jan 06, 2023
A simple python program that can be used to implement user authentication tokens into your program...

token-generator A simple python module that can be used by developers to implement user authentication tokens into your program... code examples creat

octo 6 Apr 18, 2022
PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

Duo Li 375 Jan 03, 2023
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
Planning from Pixels in Environments with Combinatorially Hard Search Spaces -- NeurIPS 2021

PPGS: Planning from Pixels in Environments with Combinatorially Hard Search Spaces Environment Setup We recommend pipenv for creating and managing vir

Autonomous Learning Group 11 Jun 26, 2022
Disentangled Lifespan Face Synthesis

Disentangled Lifespan Face Synthesis Project Page | Paper Demo on Colab Preparation Please follow this github to prepare the environments and dataset.

何森 50 Sep 20, 2022
Inteligência artificial criada para realizar interação social com idosos.

IA SONIA 4.0 A SONIA foi inspirada no assistente mais famoso do mundo e muito bem conhecido JARVIS. Todo mundo algum dia ja sonhou em ter o seu própri

Vinícius Azevedo 2 Oct 21, 2021
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022
Classifying cat and dog images using Kaggle dataset

PyTorch Image Classification Classifies an image as containing either a dog or a cat (using Kaggle's public dataset), but could easily be extended to

Robert Coleman 74 Nov 22, 2022
CAUSE: Causality from AttribUtions on Sequence of Events

CAUSE: Causality from AttribUtions on Sequence of Events

Wei Zhang 21 Dec 01, 2022
Laplacian Score-regularized Concrete Autoencoders

Laplacian Score-regularized Concrete Autoencoders Requirements: torch = 1.9 scikit-learn = 0.24 omegaconf = 2.0.6 scipy = 1.6.0 matplotlib How to

JS 6 Dec 07, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 01, 2023