Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Statically typed BNF with semantic actions; A frontend of frontend frameworks; Use your grammar everywhere.

Statically typed BNF with semantic actions; A frontend of frontend frameworks; Use your grammar everywhere.

Taine Zhao 56 Dec 14, 2022
A simple python script to convert Rubber Ducky payloads into AutoHotKey scripts

AHKDuckyReplacer A simple python script to convert Rubber Ducky payloads into AutoHotKey scripts. I have also added a sample payload for testing. I wi

Krizsan0596 5 Sep 28, 2022
Desafio Final do Mod1 do Bootcamp EDC - v2 usando a RAIS

IGTI - Bootcamp Engenheiro de Dados Cloud Módulo 1 - Desafio Final - RAIS 2020 Código do Desafio Final V2 do Bootcamp Engenheiro de Dados Cloud do IGT

Neylson Crepalde 17 Nov 02, 2022
This program generates automatically new folders containing old version of program

Automated Folder Versions Generator by Sergiy Grimoldi - V.0.0.2 This program generates automatically new folders containing old version of something

Sergiy Grimoldi 1 Dec 23, 2021
Spinning waffle from waffle shaped python code

waffle Spinning waffle from waffle shaped python code Based on a parametric curve: r(t) = 2 - 2*sin(t) + (sin(t)*abs(cos(t)))/(sin(t) + 1.4) projected

Viljar Femoen 5 Feb 14, 2022
Python dictionaries with advanced dot notation access

from box import Box movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } }) movie_box.Robin_Hood_Men_in_Tights.imdb_s

Chris Griffith 2.1k Dec 28, 2022
Wagtail + Lottie is a Wagtail package for playing Adobe After Effects animations exported as json with Bodymovin.

Wagtail Lottie Wagtail + Lottie is a Wagtail package for playing Adobe After Effects animations exported as json with Bodymovin. Usage Export your ani

Alexis Le Baron 7 Aug 18, 2022
LiteX-Acorn-Baseboard is a baseboard developed around the SQRL's Acorn board (or Nite/LiteFury) expanding their possibilities

LiteX-Acorn-Baseboard is a baseboard developed around the SQRL's Acorn board (or Nite/LiteFury) expanding their possibilities

33 Nov 26, 2022
Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm

w95keygen-python windowskeygen.py - Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm Just download and drop in the directory y

Joshua Alto 1 Dec 06, 2021
SimplePyBLE - Python bindings for SimpleBLE

The ultimate fully-fledged cross-platform Python BLE library, designed for simplicity and ease of use.

Open Bluetooth Toolbox 27 Aug 28, 2022
Buggy script to play with GPOs

GPOwned /!\ This is a buggy PoC I made just to play with GPOs in my lab. Don't use it in production! /!\ The script uses impacket and ldap3 to update

45 Dec 15, 2022
:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

pre-commit-dbt List of pre-commit hooks to ensure the quality of your dbt projects. BETA NOTICE: This tool is still BETA and may have some bugs, so pl

Offbi 262 Nov 25, 2022
a url shortener with fastapi and tortoise-orm

fastapi-tortoise-orm-url-shortener a url shortener with fastapi and tortoise-orm

19 Aug 12, 2022
Script to use SysWhispers2 direct system calls from Cobalt Strike BOFs

SysWhispers2BOF Script to use SysWhispers2 direct system calls from Cobalt Strike BOFs. Introduction This script was initially created to fix specific

FalconForce 101 Dec 20, 2022
Data Science Course at Dept. of Computer Engineering, Chula 2022

2110446 Data Science Course at Chula 2022 Short links for exercises: Week1: Intro to Numpy, Pandas Numpy: https://colab.research.google.com/github/kao

Kao Panboonyuen 17 Nov 27, 2022
Make pack up python files easier.

python-easy-pack make pack up python files easier. 目前只提供了中文环境 如何使用? 将index.py复制到你的项目文件夹,或者把.py文件拷贝到这个文件夹。 打开你的cmd或者powershell 切换到程序所在目录,输入python index

2 Dec 15, 2021
A Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python and also with the us

FSP Gang s' Admin 1 Nov 03, 2021
MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

MiniJVM MiniJVM是一款使用python编写的简易JVM,能够从本地加载class文件并且执行绝大多数指令。 支持的功能 1.从本地磁盘加载class并解析 2.支持绝大多数指令集的执行 3.支持虚拟机内存分区以及对象的创建 4.支持方法的调用和参数传递 5.支持静态代码块的初始化 不支

keguoyu 60 Apr 01, 2022
Zotero references script (and app)

A little script (and PyInstaller build) for a very specific, somewhat hack-ish purpose: managing and exporting project references with Zotero and its API.

Marius Rödder 0 Dec 05, 2021
A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt

A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt and then runs the main.py script, and it will output what recipes can be created b

Jordan Leich 3 Nov 02, 2022