Wafer Fault Detection using MlOps Integration

Overview

Wafer Fault Detection using MlOps Integration

This is an end to end machine learning project with MlOps integration for predicting the quality of wafer sensors.

Demo

  • Link

Table of Contents

  • Problem Statement
  • How to run the application
  • Technologies used
  • Proposed Solution and Architecture
  • WorkFlow of project
  • Technologies used

Problem Statement

Improper maintenance on a machine or system impacts to worsen mean time between failure (MTBF). Manual diagnostic procedures tend to extended downtime at the system breakdown. Machine learning techniques based on the internet of things (IoT) sensor data were used to make predictive maintenance to determine whether the sensor needs to be replaced or not.

How to implement the project

  • Create a conda environment
conda create -n waferops python=3.6.9
  • Activate the environment
conda activate wafer-ops
  • Install the requirements.txt file
pip install -r requirements.txt

Before running the project atleast in local environment (personal pc or laptop) run this command in new terminal, basically run the mlflow server.

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root artifacts --host 0.0.0.0 -p 5000

After running the mlflow server in new terminal, open another terminal and run the following command, since we are using fastapi. The command to run the application will change a bit

uvicorn main:app --reload

WorkFlow of the Project

To solve the problem statement we have proposed a customized machine learning approach.

WorkFlow of Project

In the first place, whenever we start a machine learning project, we need to sign a data sharing agreement with the client, where sign off some of the parameters like,

  • Format of data - like csv format or json format,etc
  • Number of Columns
  • Length of date stamp in the file
  • Length of time stamp in the file
  • DataType of each sensor - like float,int,string

The client will send multiple set of files in batches at a given location. In our case, the data which will be given to us, will consist of wafer names and 590 columns of different sensor values for each wafer. The last column will have Good/Bad value for each wafer as per the data sharing agreement

  • +1 indicates bad wafer
  • -1 indicates good wafer

These data can be found in the schema training json file.More details are present in LLD documentation of project.

Technical Aspects of the Project

As discussed, the client will send multiple set of files in batches at a given location. After signing the data sharing agreement, we create the master data management which is nothing but the schema training json file and schema prediction json (this is be used for prediction data). We have divided the project into multiple modules, for high level understanding some of them are

Training Validation

In this module,we will trigger the training validation pipeline,which will be responsible for training validation. In the training validation pipeline,we are internally triggering some of the pipelines, some of the internal function are

  • Training raw data validation - This function is responsible for validating the raw data based on schema training json file, and we have manually created a regex pattern for validating the filename of the data. We are even validating length of date time stamp, length of time stamp of the data. If some of the data does not match the criteria of the master data management, if move that files to bad folder and will not be used for training or prediction purposes.

  • Data Transformation - Previously, we have created both good and bad directory for storing the data based on the master data management. Now for the data transformation we are only performing the data transformation on good data folder. In the data transformation, we replace the missing values with the nan values.

  • DataBase Operation - Now that we have validated the data and transformed the data which is suitable for the further training purposes. In database operation we are using SQL-Lite. From the good folder we are inserting the data into a database. After the insertion of the data is done we are deleting the good data folder and move the bad folder to archived folder. Next inserting the good database, we are extracting the data from the database and converting into csv format.

Training Model

In the previous pipeline,after the database operation, we have exported the good data from database to csv format. In the training model pipeline, we are first fetching the data from the exported csv file.

Next comes the preprocessing of the data, where we are performing some of the preprocessing functions such as remove columns, separate label feature, imputing the missing the values if present. Dropping the columns with zero standard deviation.

As mentioned we are trying to solve the problem by using customized machine learning approach.We need to create clusters of data which represents the variation of data. Clustering of the data is based on K-Means clustering algorithm.

For every cluster which has been created two machine learning models are being trained which are RandomForest and XGBoost models with GridSearchCV as the hyperparameter tuning technique. The metrics which are monitoring are accuracy and roc auc score as the metric.

After training all the models, we are saving them to trained models folders.

Now that the models are saved into the trained models folder, here the mlops part comes into picture, where in for every cluster we are logging the parameters, metrics and models to mlflow server. On successful completion of training of all the models and logging them to mlflow, next pipeline will be triggered which is load production model pipeline.

Since all the trained models, will have different metrics and parameters, which can productionize them based on metrics. For this project we have trained 6 models and we will productionize 3 models along with KMeans model for the prediction service.

Here is glimpse of the mlflow server showing stages of the models (Staging or Production based on metrics)

mlflow server image

Prediction pipeline

The prediction pipeline will be triggered following prediction validation and prediction from the model. In this prediction pipeline, the same validation steps like validating file name and so on. The prediction pipeline, and the preprocessing of prediction data. For the prediction, we will load the trained kmeans model and then predict the number of clusters, and for every cluster, model will be loaded and the prediction will be done. The predictions will saved to predictions.csv file and then prediction is completed.

Technologies Used

  • Python
  • Sklearn
  • FastAPI
  • Machine Learning
  • Numpy
  • Pandas
  • MlFlow
  • SQL-Lite

Algorithms Used

  • Random Forest
  • XGBoost

Metrics

  • Accuracy
  • ROC AUC score

Cloud Deployment

  • AWS
Owner
Sethu Sai Medamallela
Aspiring Machine Learning Engineer
Sethu Sai Medamallela
A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Probabilistic U-Net + **Update** + An improved Model (the Hierarchical Probabilistic U-Net) + LIDC crops is now available. See below. Re-implementatio

Simon Kohl 498 Dec 26, 2022
YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

官方讨论群 QQ群:552703875 微信群:15158106211(先加作者微信,再邀请入群) YoloAll项目简介 YoloAll是一个将当前主流Yolo版本集成到同一个UI界面下的推理预测工具。可以迅速切换不同的yolo版本,并且可以针对图片,视频,摄像头码流进行实时推理,可以很方便,直观

DL-Practise 244 Jan 01, 2023
Pytorch version of SfmLearner from Tinghui Zhou et al.

SfMLearner Pytorch version This codebase implements the system described in the paper: Unsupervised Learning of Depth and Ego-Motion from Video Tinghu

Clément Pinard 909 Dec 22, 2022
[WACV 2022] Contextual Gradient Scaling for Few-Shot Learning

CxGrad - Official PyTorch Implementation Contextual Gradient Scaling for Few-Shot Learning Sanghyuk Lee, Seunghyun Lee, and Byung Cheol Song In WACV 2

Sanghyuk Lee 4 Dec 05, 2022
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

137 Dec 15, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation This repository contains the source code of our paper, ESPNet (acc

Sachin Mehta 515 Dec 13, 2022
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation Requirements This repository needs mmsegmentation Training To train

Adelaide Intelligent Machines (AIM) Group 7 Sep 12, 2022
QAT(quantize aware training) for classification with MQBench

MQBench Quantization Aware Training with PyTorch I am using MQBench(Model Quantization Benchmark)(http://mqbench.tech/) to quantize the model for depl

Ling Zhang 29 Nov 18, 2022
Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch

Jiangjingwen 13 Jan 06, 2023
Only works with the dashboard version / branch of jesse

Jesse optuna Only works with the dashboard version / branch of jesse. The config.yml should be self-explainatory. Installation # install from git pip

Markus K. 8 Dec 04, 2022
Self-Supervised CNN-GCN Autoencoder

GCNDepth Self-Supervised CNN-GCN Autoencoder GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network To be published

53 Dec 14, 2022
An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Augmentation-Free Self-Supervised Learning on Graphs An official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted

Namkyeong Lee 59 Dec 01, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022
La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

Billy 20 Sep 12, 2021
Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Coursera-deep-learning-specialization - Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks an

Aman Chadha 1.7k Jan 08, 2023
Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Self-Supervised Bug Detection and Repair This is the reference code to replicate the research in Self-Supervised Bug Detection and Repair in NeurIPS 2

Microsoft 85 Dec 24, 2022
HGCN: Harmonic Gated Compensation Network For Speech Enhancement

HGCN The official repo of "HGCN: Harmonic Gated Compensation Network For Speech Enhancement", which was accepted at ICASSP2022. How to use step1: Calc

ScorpioMiku 33 Nov 14, 2022
Do Neural Networks for Segmentation Understand Insideness?

This is part of the code to reproduce the results of the paper Do Neural Networks for Segmentation Understand Insideness? [pdf] by K. Villalobos (*),

biolins 0 Mar 20, 2021
Reference models and tools for Cloud TPUs.

Cloud TPUs This repository is a collection of reference models and tools used with Cloud TPUs. The fastest way to get started training a model on a Cl

5k Jan 05, 2023