Anomaly Detection

시계열 데이터에 대한 이상치 탐지

1. Kernel Density Estimation을 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 kernel density estimation 모델을 적합하여 정상 데이터의 분포를 추정함
추정된 분포를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python kde.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/kde'

2. Local Outlier Factor를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 Local Outlier Factor 모델을 적합하여 n_neighbors 개수의 이웃을 기반으로 정상 데이터의 밀도를 추정함
추정된 밀도를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python lof.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/lof' \
              --n_neighbors=5

3. Isolation Forest를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 isolation forest 모델을 적합함
Train data를 reference set으로 사용하여 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python iforest.py --train_data_path='./data/nasa_bearing_train.csv' \
                  --test_data_path='./data/nasa_bearing_test.csv' \
                  --save_root_path='./result/iforest'

4. Spectral Residual을 활용한 이상치 탐지

설정된 window size 와 score window size 를 통해 window 구간 내 이상치를 탐지함
score window size 는 window size 보다 크게 설정해야함

python spectral.py --window= 24 \
                  --score_window=100

Anomaly Detection 이상치 탐지 전처리 모듈

Related tags

Overview

Anomaly Detection

1. Kernel Density Estimation을 활용한 이상치 탐지

2. Local Outlier Factor를 활용한 이상치 탐지

3. Isolation Forest를 활용한 이상치 탐지

4. Spectral Residual을 활용한 이상치 탐지

Owner

CLUST-consortium

Mlcode - Continuous ML API Integrations

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

Document processing using transformers

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

NLP: SLU tagging

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Contract Understanding Atticus Dataset

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

A script that automatically creates a branch name using google translation api and jira api

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

Text Classification Using LSTM

Machine translation models released by the Gourmet project

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Two-stage text summarization with BERT and BART

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"