Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).

Overview

Knowledge Informed Machine Learning using a Weibull-based Loss Function

Exploring the concept of knowledge-informed machine learning with the use of a Weibull-based loss function. Used to predict remaining useful life (RUL) on the IMS and PRONOSTIA (also called FEMTO) bearing data sets.

Open In Colab Source code arXiv

Knowledge-informed machine learning is used on the IMS and PRONOSTIA bearing data sets for remaining useful life (RUL) prediction. The knowledge is integrated into a neural network through a novel Weibull-based loss function. A thorough statistical analysis of the Weibull-based loss function is conducted, demonstrating the effectiveness of the method on the PRONOSTIA data set. However, the Weibull-based loss function is less effective on the IMS data set.

The experiment will be detailed in the Journal of Prognostics and Health Management (accepted and pending publication -- preprint here), with an extensive discussion on the results, shortcomings, and benefits analysis. The paper also gives an overview of knowledge informed machine learning as it applies to prognostics and health management (PHM).

You can replicate the work, and all figures, by following the instructions in the Setup section. Even easier: run the Colab notebook!

If you have any questions, leave a comment in the discussion, or email me ([email protected]).

Summary

In this work, we use the definition of knowledge informed machine learning from von Rueden et al. (their excellent paper is here). Here's the general taxonomy of our knowledge informed machine learning experiment:

source_rep_int

Bearing vibration data (from the frequency domain) was used as input to feed-forward neural networks. The below figure demonstrates the data as a spectrogram (a) and the spectrogram after "binning" (b). The binned data was used as input.

spectrogram

A large hyper-parameter search was conducted on neural networks. Nine different Weibull-based loss functions were tested on each unique network.

The below chart is a qualitative method of showing the effectiveness of the Weibull-based loss functions on the two data sets.

loss function percentage

We also conducted a statistical analysis of the results, as shown below.

correlation of the weibull-based loss function to results

The top performing models' RUL trends are shown below, for both the IMS and PRONOSTIA data sets.

IMS RUL  trend
PRONOSTIA RUL  trend

Setup

Tested in linux (MacOS should also work). If you run windows you'll have to do much of the environment setup and data download/preprocessing manually.

To reproduce results:

  1. Clone this repo - clone https://github.com/tvhahn/weibull-knowledge-informed-ml.git

  2. Create virtual environment. Assumes that Conda is installed.

    • Linux/MacOS: use command from the Makefile in the root directory - make create_environment
    • Windows: from root directory - conda env create -f envweibull.yml
    • HPC: make create_environment will detect HPC environment and automatically create environment from make_hpc_venv.sh. Tested on Compute Canada. Modify make_hpc_venv.sh for your own HPC cluster.
  3. Download raw data.

    • Linux/MacOS: use make download. Will automatically download to appropriate data/raw directory.
    • Windows: Manually download the the IMS and PRONOSTIA (FEMTO) data sets from NASA prognostics data repository. Put in data/raw folder.
    • HPC: use make download. Will automatically detect HPC environment.
  4. Extract raw data.

    • Linux/MacOS: use make extract. Will automatically extract to appropriate data/raw directory.
    • Windows: Manually extract data. See the Project Organization section for folder structure.
    • HPC: use make download. Will automatically detect HPC environment. Again, modify for your HPC cluster.
  5. Ensure virtual environment is activated. conda activate weibull or source ~/weibull/bin/activate

  6. From root directory of weibull-knowledge-informed-ml, run pip install -e . -- this will give the python scripts access to the src folders.

  7. Train!

    • Linux/MacOS: use make train_ims or make train_femto. Note: set constants in the makefile for changing random search parameters. Currently set as default.

    • Windows: run manually by calling the script - python train_ims or python train_femto with the appropriate arguments. For example: src/models/train_models.py --data_set femto --path_data your_data_path --proj_dir your_project_directory_path

    • HPC: use make train_ims or make train_femto. The HPC environment should be automatically detected. A SLURM script will be run for a batch job.

      • Modify the train_modify_ims_hpc.sh or train_model_femto_hpc.sh in the src/models directory to meet the needs of your HPC cluster. This should work on Compute Canada out of the box.
  8. Filter out the poorly performing models and collate the results. This will create several results files in the models/final folder.

    • Linux/MacOS: use make summarize_ims_models or make summarize_femto_models. (note: set filter boundaries in summarize_model_results.py. Will eventually modify for use with Argparse...)
    • Windows: run manually by calling the script.
    • HPC: use make summarize_ims_models or make summarize_femto_models. Again, change filter requirements in the summarize_model_results.py script.
  9. Make the figures of the data and results.

    • Linux/MacOS: use make figures_data and make figures_results. Figures will be generated and placed in the reports/figures folder.
    • Windows: run manually by calling the script.
    • HPC: use make figures_data and make figures_results

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands to reproduce work, lik `make data` or `make train_ims`
├── README.md          <- The top-level README.
├── data
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump. Downloaded from the NASA Prognostic repository.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details (nothing in here yet)
│
├── models             <- Trained models, model predictions, and model summaries
│   ├── interim        <- Intermediate models that have not analyzed. Output from the random search.
│   ├── final          <- Final models that have been filtered and summarized. Several outpu csv files as well.
│
├── notebooks          <- Jupyter notebooks used for data exploration and analysis. Of varying quality.
│   ├── scratch        <- Scratch notebooks for quick experimentation.     
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials (empty).
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── envweibull.yml    <- The Conda environment file for reproducing the analysis environment
│                        recommend using Conda).
│
├── make_hpc_venv.sh  <- Bash script to create the HPC venv. Setup for my Compute Canada cluster.
│                        Modify to suit your own HPC cluster.
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models               
│   │   └── predict_model.py
│   │
│   └── visualization  <- Scripts to create figures of the data, results, and training progress
│       ├── visualize_data.py       
│       ├── visualize_results.py     
│       └── visualize_training.py    

Future List

As noted in the paper, the best thing would be to test out Weibull-based loss functions on large, and real-world, industrial datasets. Suitable applications may include large fleets of pumps or gas turbines.

Owner
Tim
Data science. Innovation. ML practitioner.
Tim
Reviving Iterative Training with Mask Guidance for Interactive Segmentation

This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation

Visual Understanding Lab @ Samsung AI Center Moscow 406 Jan 01, 2023
Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

Hugging Face 54 Dec 20, 2022
YOLOX_AUDIO is an audio event detection model based on YOLOX

YOLOX_AUDIO is an audio event detection model based on YOLOX, an anchor-free version of YOLO. This repo is an implementated by PyTorch. Main goal of YOLOX_AUDIO is to detect and classify pre-defined

intflow Inc. 77 Dec 19, 2022
TDmatch is a Python library developed to perform matching tasks in three categories:

TDmatch TDmatch is a Python library developed to perform matching tasks in three categories: Text to Data which matches tuples of a table to text docu

Naser Ahmadi 5 Aug 11, 2022
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022
ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Monocular, One-stage, Regression of Multiple 3D People ROMP, accepted by ICCV 2021, is a concise one-stage network for multi-person 3D mesh recovery f

Yu Sun 937 Jan 04, 2023
People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

4 Sep 21, 2021
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
An efficient implementation of GPNN

Efficient-GPNN An efficient implementation of GPNN as depicted in "Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Mo

7 Apr 16, 2022
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor

John 15 Dec 06, 2022
MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

13 Dec 01, 2022
LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation

LightNet++ !!!New Repo.!!! ⇒ EfficientNet.PyTorch: Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights !!

linksense 237 Jan 05, 2023
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning. Please check https://ncvx.org for detailed instruction

SUN Group @ UMN 28 Aug 03, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
Official repository for the paper F, B, Alpha Matting

FBA Matting Official repository for the paper F, B, Alpha Matting. This paper and project is under heavy revision for peer reviewed publication, and s

Marco Forte 404 Jan 05, 2023
Malware Bypass Research using Reinforcement Learning

Malware Bypass Research using Reinforcement Learning

Bobby Filar 76 Dec 26, 2022
An example showing how to use jax to train resnet50 on multi-node multi-GPU

jax-multi-gpu-resnet50-example This repo shows how to use jax for multi-node multi-GPU training. The example is adapted from the resnet50 example in d

Yangzihao Wang 20 Jul 04, 2022
natural image generation using ConvNets

The Eyescream Project Generating Natural Images using Neural Networks. For our research summary on this work, please read the Arxiv paper: http://arxi

Meta Archive 601 Nov 23, 2022
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv Abstract Convolutional Neural Networks (CNNs) have become the de f

Philipp Benz 12 Oct 24, 2022
SARS-Cov-2 Recombinant Finder for fasta sequences

Sc2rf - SARS-Cov-2 Recombinant Finder Pronounced: Scarf What's this? Sc2rf can search genome sequences of SARS-CoV-2 for potential recombinants - new

Lena Schimmel 41 Oct 03, 2022