Telemanom (v2.0)

v2.0 updates:

Vectorized operations via numpy
Object-oriented restructure, improved organization
Merge branches into single branch for both processing modes (with/without labels)
Update requirements.txt and Dockerfile
Updated result output for both modes
PEP8 cleanup

Anomaly Detection in Time Series Data Using LSTMs and Automatic Thresholding

Telemanom employs vanilla LSTMs using Keras/Tensorflow to identify anomalies in multivariate sensor data. LSTMs are trained to learn normal system behaviors using encoded command information and prior telemetry values. Predictions are generated at each time step and the errors in predictions represent deviations from expected behavior. Telemanom then uses a novel nonparametric, unsupervised approach for thresholding these errors and identifying anomalous sequences of errors.

This repo along with the linked data can be used to re-create the experiments in our 2018 KDD paper, "Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding", which describes the background, methodologies, and experiments in more detail. While the system was originally deployed to monitor spacecraft telemetry, it can be easily adapted to similar problems.

Getting Started

Clone the repo (only available from source currently):

git clone https://github.com/khundman/telemanom.git && cd telemanom

Configure system/modeling parameters in config.yaml file (to recreate experiment from paper, leave as is). For example:

train: True if True, a new model will be trained for each input stream. If False (default) existing trained model will be loaded and used to generate predictions
predict: True Generate new predictions using models. If False (default), use existing saved predictions in evaluation (useful for tuning error thresholding and skipping prior processing steps)
l_s: 250 Determines the number of previous timesteps input to the model at each timestep t (used to generate predictions)

To run via Docker:

docker build -t telemanom .

# rerun experiment detailed in paper or run with your own set of labeled anomlies in 'labeled_anomalies.csv'
docker run telemanom -l labeled_anomalies.csv

# run without labeled anomalies
docker run telemanom

To run with local or virtual environment

From root of repo, curl and unzip data:

curl -O https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

Install dependencies using python 3.6+ (recommend using a virtualenv):

pip install -r requirements.txt

Begin processing (from root of repo):

# rerun experiment detailed in paper or run with your own set of labeled anomlies
python example.py -l labeled_anomalies.csv

# run without labeled anomalies
python example.py

A jupyter notebook for evaluating results for a run is at telemanom/result_viewer.ipynb. To launch notebook:

jupyter notebook telemanom/result-viewer.ipynb

Plotly is used to generate interactive inline plots, e.g.:

Data

Using your own data

Pre-split training and test sets must be placed in directories named data/train/ and data/test. One .npy file should be generated for each channel or stream (for both train and test) with shape (n_timesteps, n_inputs). The filename should be a unique channel name or ID. The telemetry values being predicted in the test data must be the first feature in the input.

For example, a channel T-1 should have train/test sets named T-1.npy with shapes akin to (4900,61) and (3925, 61), where the number of input dimensions are matching (61). The actual telemetry values should be along the first dimension (4900,1) and (3925,1).

Raw experiment data

The raw data available for download represents real spacecraft telemetry data and anomalies from the Soil Moisture Active Passive satellite (SMAP) and the Curiosity Rover on Mars (MSL). All data has been anonymized with regard to time and all telemetry values are pre-scaled between (-1,1) according to the min/max in the test set. Channel IDs are also anonymized, but the first letter gives indicates the type of channel (P = power, R = radiation, etc.). Model input data also includes one-hot encoded information about commands that were sent or received by specific spacecraft modules in a given time window. No identifying information related to the timing or nature of commands is included in the data. For example:

This data also includes pre-split test and training data, pre-trained models, predictions, and smoothed errors generated using the default settings in config.yaml. When getting familiar with the repo, running the result-viewer.ipynb notebook to visualize results is useful for developing intuition. The included data also is useful for isolating portions of the system. For example, if you wish to see the effects of changes to the thresholding parameters without having to train new models, you can set Train and Predict to False in config.yaml to use previously generated predictions from prior models.

Anomaly labels and metadata

The anomaly labels and metadata are available in labeled_anomalies.csv, which includes:

channel id: anonymized channel id - first letter represents nature of channel (P = power, R = radiation, etc.)
spacecraft: spacecraft that generated telemetry stream
anomaly_sequences: start and end indices of true anomalies in stream
class: the class of anomaly (see paper for discussion)
num values: number of telemetry values in each stream

To provide your own labels, use the labeled_anomalies.csv file as a template. The only required fields/columns are channel_id and anomaly_sequences. anomaly_sequences is a list of lists that contain start and end indices of anomalous regions in the test dataset for a channel.

Dataset and performance statistics:

Data

	SMAP	MSL	Total
Total anomaly sequences	69	36	105
Point anomalies (% tot.)	43 (62%)	19 (53%)	62 (59%)
Contextual anomalies (% tot.)	26 (38%)	17 (47%)	43 (41%)
Unique telemetry channels	55	27	82
Unique ISAs	28	19	47
Telemetry values evaluated	429,735	66,709	496,444

Performance (with default params specified in paper)

Spacecraft	Precision	Recall	F_0.5 Score
SMAP	85.5%	85.5%	0.71
Curiosity (MSL)	92.6%	69.4%	0.69
Total	87.5%	80.0%	0.71

Processing

Each time the system is started a unique datetime ID (ex. 2018-05-17_16.28.00) will be used to create the following

a results file (in results/) that extends labeled_anomalies.csv to include identified anomalous sequences and related info
a data subdirectory containing data files for created models, predictions, and smoothed errors for each channel. A file called params.log is also created that contains parameter settings and logging output during processing.

As mentioned, the jupyter notebook telemanom/result-viewer.ipynb can be used to visualize results for each stream.

Citation

If you use this work, please cite:

  title={Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding},
  author={Hundman, Kyle and Constantinou, Valentino and Laporte, Christopher and Colwell, Ian and Soderstrom, Tom},
  journal={arXiv preprint arXiv:1802.04431},
  year={2018}
}

License

Telemanom is distributed under Apache 2.0 license.

Contact: Kyle Hundman ([email protected])

Contributors

Kyle Hundman (NASA JPL)
Valentinos Constantinou (NASA JPL)
Chris Laporte (NASA JPL)
Ian Colwell (NASA JPL)

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Related tags

Overview

Telemanom (v2.0)

Anomaly Detection in Time Series Data Using LSTMs and Automatic Thresholding

Getting Started

To run via Docker:

To run with local or virtual environment

Data

Using your own data

Raw experiment data

Anomaly labels and metadata

Dataset and performance statistics:

Data

Performance (with default params specified in paper)

Processing

Citation

License

Contributors

Owner

Kyle Hundman

Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Deep motion generator collections

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

A universal memory dumper using Frida

Permute Me Softly: Learning Soft Permutations for Graph Representations

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

[CVPR 2022] Deep Equilibrium Optical Flow Estimation

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

meProp: Sparsified Back Propagation for Accelerated Deep Learning

An SMPC companion library for Syft

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

⚓ Eurybia monitor model drift over time and securize model deployment with data validation

A very impractical 3D rendering engine that runs in the python terminal.

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020