To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Converts a base copy of Pokemon BDSP's masterdatas into a more readable and editable Pokemon Showdown Format.

Showdown-BDSP-Converter Converts a base copy of Pokemon BDSP's masterdatas into a more readable and editable Pokemon Showdown Format. Download the lat

Alden Mo 2 Jan 02, 2022
Blender Add-on That Provides Quick Access to Render Controls

Blender Render Buttons Blender Add-on That Provides Quick Access to Render Controls A Blender 3.0 compatablity update of Blender2.8x-RenderButton v0.0

Don Schnitzius 3 Oct 18, 2022
Hspice-Wave-Generator is a tool used to quickly generate stimuli souces of hspice format

Hspice-Wave-Generator is a tool used to quickly generate stimuli souces of hspice format. All the stimuli sources are based on `pwl` function of HSPICE and the specific complex operations of writing

3 Aug 02, 2022
A collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to provide an easy-to-use access to them.

KGQA Datasets Brief Introduction This repository is a collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to

Semantic Systems research group 21 Jan 06, 2023
Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Alexandra 2 May 18, 2022
Replit theme sync; Github theme sync but in Replit.

This is a Replit theme sync, basically meaning that it keeps track of the current time (which may need to be edited later on), and if the time passes morning, afternoon, etc, the theme switches. The

Glitch 8 Jun 25, 2022
Port of the OpenCascade library to JavaScript / WebAssembly using Emscripten

OpenCascade.js A port of the OpenCascade CAD library to JavaScript and WebAssembly via Emscripten. Explore the docs » Examples · Issues · Discuss Proj

Sebastian Alff 347 Jan 08, 2023
This is a simple analogue clock made with turtle in python...

Analogue-Clock This is a simple analogue clock made with turtle in python... Requirements None, only you need to have windows 😉 ...Enjoy! Installatio

Abhyush 3 Jan 14, 2022
Learn the basics of Python. These tutorials are for Python beginners. so even if you have no prior knowledge of Python, you won’t face any difficulty understanding these tutorials.

01_Python_Introduction Introduction 👋 Python is a modern, robust, high level programming language. It is very easy to pick up even if you are complet

Milaan Parmar / Милан пармар / _米兰 帕尔马 245 Dec 30, 2022
Artificial intelligence based on 5-dimensional quantum selection

Deep Thought An artificial intelligence based on 5-dimensional quantum selection. Algorithm The payload Make an random bit array (e.g. 1101...) Conver

Larry Holst 3 Dec 14, 2022
Karte der Allgemeinverfügungen zu Schulschließungen oder eingeschränktem Regelbetrieb in Sachsen

SNSZ Karte Datenquelle: Allgemeinverfügungen zu Schulschließungen oder eingeschränktem Regelbetrieb in Sachsen Sächsisches Staatsministerium für Kultu

Jannis Leidel 3 Sep 26, 2022
LinuxHelper - A collection of utilities for non-technical Linux users accessible via a GUI

Linux Helper A collection of utilities for non-technical Linux users accessible via a GUI This app is still in very early development, expect bugs and

Seth 7 Oct 03, 2022
Team10 backend - A service which accepts a VRM (Vehicle Registration Mark)

GreenShip - API A service which accepts a VRM (Vehicle Registration Mark) and re

3D Hack 1 Jan 21, 2022
A tool that bootstraps your dotfiles ⚡️

Dotbot Dotbot makes installing your dotfiles as easy as git clone $url && cd dotfiles && ./install, even on a freshly installed system! Rationale Gett

Anish Athalye 5.9k Jan 07, 2023
Async Python Circuit Breaker implementation

aiocircuitbreaker This is an async Python implementation of the circuitbreaker library. Installation The project is available on PyPI. Simply run: $ p

5 Sep 05, 2022
An example module hooking system, will be used in PySAMP.

An example module hooking system, will be used in PySAMP.

2 May 01, 2022
An upgraded version of extractJS

extractJS_2.0 An enhanced version of extractJS with even more functionality Features Discover JavaScript files directly from the webpage Customizable

Ali 4 Dec 21, 2022
Python library to natively send files to Trash (or Recycle bin) on all platforms.

Send2Trash -- Send files to trash on all platforms Send2Trash is a small package that sends files to the Trash (or Recycle Bin) natively and on all pl

Andrew Senetar 224 Jan 04, 2023
A simple and usefull python calculator.

simplepy-calculator Your simple and fresh calculator. Getting Started Install python3 from the oficial python website or via terminal. Clone this repo

Felix Sanchez 1 Jan 18, 2022
Svg-turtle - Use the Python turtle to write SVG files

SaVaGe Turtle Use the Python turtle to write SVG files If you're using the Pytho

Don Kirkby 7 Dec 21, 2022