Anomaly Detection

시계열 데이터에 대한 이상치 탐지

1. Kernel Density Estimation을 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 kernel density estimation 모델을 적합하여 정상 데이터의 분포를 추정함
추정된 분포를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python kde.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/kde'

2. Local Outlier Factor를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 Local Outlier Factor 모델을 적합하여 n_neighbors 개수의 이웃을 기반으로 정상 데이터의 밀도를 추정함
추정된 밀도를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python lof.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/lof' \
              --n_neighbors=5

3. Isolation Forest를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 isolation forest 모델을 적합함
Train data를 reference set으로 사용하여 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python iforest.py --train_data_path='./data/nasa_bearing_train.csv' \
                  --test_data_path='./data/nasa_bearing_test.csv' \
                  --save_root_path='./result/iforest'

4. Spectral Residual을 활용한 이상치 탐지

설정된 window size 와 score window size 를 통해 window 구간 내 이상치를 탐지함
score window size 는 window size 보다 크게 설정해야함

python spectral.py --window= 24 \
                  --score_window=100

Anomaly Detection 이상치 탐지 전처리 모듈

Related tags

Overview

Anomaly Detection

1. Kernel Density Estimation을 활용한 이상치 탐지

2. Local Outlier Factor를 활용한 이상치 탐지

3. Isolation Forest를 활용한 이상치 탐지

4. Spectral Residual을 활용한 이상치 탐지

Owner

CLUST-consortium

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

Implementation of ProteinBERT in Pytorch

Findings of ACL 2021

nlpcommon is a python Open Source Toolkit for text classification.

End-to-End Speech Processing Toolkit

Every Google, Azure & IBM text to speech voice for free

Library for fast text representation and classification.

🤖 Basic Financial Chatbot with handoff ability built with Rasa

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

CPC-big and k-means clustering for zero-resource speech processing

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Repository for Project Insight: NLP as a Service

BookNLP, a natural language processing pipeline for books

BERT, LDA, and TFIDF based keyword extraction in Python

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages