To build a regression model to predict the concrete compressive strength based on the different features in the training data.

Overview

Cement-Strength-Prediction

Problem Statement

To build a regression model to predict the concrete compressive strength based on the different features in the training data. Architecture

Data Description

Given is the variable name, variable type, the measurement unit and a brief description. The concrete compressive strength is the regression problem. The order of this listing corresponds to the order of numerals along the rows of the database.

Architecture

architecture

Name Data Type Measurement Description

Cement (component 1) quantitative kg in a m3 mixture Input Variable

Blast Furnace Slag (component 2) quantitative kg in a m3 mixture Input Variable-- Blast furnace slag is a nonmetallic coproduct produced in the process. It consists primarily of silicates, aluminosilicates, and calcium-alumina-silicates

Fly Ash (component 3) quantitative kg in a m3 mixture Input Variable- it is a coal combustion product that is composed of the particulates (fine particles of burned fuel) that are driven out of coal-fired boilers together with the flue gases.

Water (component 4) quantitative kg in a m3 mixture Input Variable

Superplasticizer (component 5) quantitative kg in a m3 mixture Input Variable--Superplasticizers (SP's), also known as high range water reducers, are additives used in making high strength concrete. Their addition to concrete or mortar allows the reduction of the water to cement ratio without negatively affecting the workability of the mixture, and enables the production of self-consolidating concrete and high performance concrete

Coarse Aggregate (component 6) quantitative kg in a m3 mixture Input Variable-- construction aggregate, or simply "aggregate", is a broad category of coarse to medium grained particulate material used in construction, including sand, gravel, crushed stone, slag, recycled concrete and geosynthetic aggregates

Fine Aggregate (component 7) quantitative kg in a m3 mixture Input Variable—Similar to coarse aggregate, the constitution is much finer. Age quantitative Day (1~365) Input Variable

Concrete compressive strength quantitative MPa Output Variable

Apart from training files, we also require a "schema" file from the client, which contains all the relevant information about the training files such as: Name of the files, Length of Date value in FileName, Length of Time value in FileName, Number of Columns, Name of the Columns, and their datatype.

Data Validation

In this step, we perform different sets of validation on the given set of training files.

  1. Name Validation- We validate the name of the files based on the given name in the schema file. We have created a regex pattern as per the name given in the schema file to use for validation. After validating the pattern in the name, we check for the length of date in the file name as well as the length of time in the file name. If all the values are as per requirement, we move such files to "Good_Data_Folder" else we move such files to "Bad_Data_Folder."

  2. Number of Columns - We validate the number of columns present in the files, and if it doesn't match with the value given in the schema file, then the file is moved to "Bad_Data_Folder."

  3. Name of Columns - The name of the columns is validated and should be the same as given in the schema file. If not, then the file is moved to "Bad_Data_Folder".

  4. The datatype of columns - The datatype of columns is given in the schema file. This is validated when we insert the files into Database. If the datatype is wrong, then the file is moved to "Bad_Data_Folder".

  5. Null values in columns - If any of the columns in a file have all the values as NULL or missing, we discard such a file and move it to "Bad_Data_Folder".

Data Insertion in Database

  1. Database Creation and connection - Create a database with the given name passed. If the database is already created, open the connection to the database.
  2. Table creation in the database - Table with name - "Good_Data", is created in the database for inserting the files in the "Good_Data_Folder" based on given column names and datatype in the schema file. If the table is already present, then the new table is not created and new files are inserted in the already present table as we want training to be done on new as well as old training files.
  3. Insertion of files in the table - All the files in the "Good_Data_Folder" are inserted in the above-created table. If any file has invalid data type in any of the columns, the file is not loaded in the table and is moved to "Bad_Data_Folder".

Model Training

  1. Data Export from Db - The data in a stored database is exported as a CSV file to be used for model training.
  2. Data Preprocessing
    a) Check for null values in the columns. If present, impute the null values using the KNN imputer b) transform the features using log transformation c) Scale the training and test data separately
  3. Clustering - KMeans algorithm is used to create clusters in the preprocessed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using "KneeLocator" function. The idea behind clustering is to implement different algorithms To train data in different clusters. The Kmeans model is trained over preprocessed data and the model is saved for further use in prediction.
  4. Model Selection - After clusters are created, we find the best model for each cluster. We are using two algorithms, "Random forest Regressor" and “Linear Regression”. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the Rsquared scores for both models and select the model with the best score. Similarly, the model is selected for each cluster. All the models for every cluster are saved for use in prediction.

Prediction Data Description

Client will send the data in multiple set of files in batches at a given location. Data will contain climate indicators in 8 columns. Apart from prediction files, we also require a "schema" file from client which contains all the relevant information about the training files such as: Name of the files, Length of Date value in FileName, Length of Time value in FileName, Number of Columns, Name of the Columns and their datatype. Data Validation
In this step, we perform different sets of validation on the given set of training files.

  1. Name Validation- We validate the name of the files on the basis of given Name in the schema file. We have created a regex pattern as per the name given in schema file, to use for validation. After validating the pattern in the name, we check for length of date in the file name as well as length of time in the file name. If all the values are as per requirement, we move such files to "Good_Data_Folder" else we move such files to "Bad_Data_Folder".
  2. Number of Columns - We validate the number of columns present in the files, if it doesn't match with the value given in the schema file then the file is moved to "Bad_Data_Folder".
  3. Name of Columns - The name of the columns is validated and should be same as given in the schema file. If not, then the file is moved to "Bad_Data_Folder".
  4. Datatype of columns - The datatype of columns is given in the schema file. This is validated when we insert the files into Database. If dataype is wrong then the file is moved to "Bad_Data_Folder".
  5. Null values in columns - If any of the columns in a file has all the values as NULL or missing, we discard such file and move it to "Bad_Data_Folder".

Data Insertion in Database

  1. Database Creation and connection - Create database with the given name passed. If the database is already created, open the connection to the database.
  2. Table creation in the database - Table with name - "Good_Data", is created in the database for inserting the files in the "Good_Data_Folder" on the basis of given column names and datatype in the schema file. If table is already present then new table is not created, and new files are inserted the already present table as we want training to be done on new as well old training files.
  3. Insertion of files in the table - All the files in the "Good_Data_Folder" are inserted in the above-created table. If any file has invalid data type in any of the columns, the file is not loaded in the table and is moved to "Bad_Data_Folder".

Prediction

  1. Data Export from Db - The data in the stored database is exported as a CSV file to be used for prediction.
  2. Data Preprocessing
    a) Check for null values in the columns. If present, impute the null values using the KNN imputer b) transform the features using log transformation c) Scale the training and test data separately
  3. Clustering - KMeans model created during training is loaded, and clusters for the preprocessed prediction data is predicted.
  4. Prediction - Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster.
  5. Once the prediction is made for all the clusters, the predictions along with the original names before label encoder are saved in a CSV file at a given location and the location is returned to the client.
Owner
Ashish Kumar
Research Fellow | Data Analyst | Data Scientist
Ashish Kumar
Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

ChrisLiu 5 Oct 18, 2022
DyNet: The Dynamic Neural Network Toolkit

The Dynamic Neural Network Toolkit General Installation C++ Python Getting Started Citing Releases and Contributing General DyNet is a neural network

Chris Dyer's lab @ LTI/CMU 3.3k Jan 06, 2023
Tensors and neural networks in Haskell

Hasktorch Hasktorch is a library for tensors and neural networks in Haskell. It is an independent open source community project which leverages the co

hasktorch 920 Jan 04, 2023
Unet network with mean teacher for altrasound image segmentation

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022
Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Visual Adversarial Imitation Learning using Variational Models (VMAIL) This is the official implementation of the NeurIPS 2021 paper. Project website

14 Nov 18, 2022
Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

One Pixel Attack How simple is it to cause a deep neural network to misclassify an image if an attacker is only allowed to modify the color of one pix

Dan Kondratyuk 1.2k Dec 26, 2022
curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

curl-impersonate A special compilation of curl that makes it impersonate real browsers. It can impersonate the four major browsers: Chrome, Edge, Safa

lwthiker 1.9k Jan 03, 2023
Machine Learning with JAX Tutorials

The purpose of this repo is to make it easy to get started with JAX. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I fou

Aleksa Gordić 372 Dec 28, 2022
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking Updates 08/2021: check out our domain adaptation for video segmentation paper Domain A

17 Nov 30, 2022
ML-based medical imaging using Azure

Disclaimer This code is provided for research and development use only. This code is not intended for use in clinical decision-making or for any other

Microsoft Azure 68 Dec 23, 2022
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

Phil Wang 108 Nov 23, 2022
Crawl & visualize ICLR papers and reviews

Crawl and Visualize ICLR 2022 OpenReview Data Descriptions This Jupyter Notebook contains the data crawled from ICLR 2022 OpenReview webpages and thei

Federico Berto 75 Dec 05, 2022
Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Ioana Bica 17 Dec 01, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 828 Dec 28, 2022
Rule-based Customer Segmentation

Rule-based Customer Segmentation Business Problem A game company wants to create level-based new customer definitions (personas) by using some feature

Cem Çaluk 2 Jan 03, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022
Multi-task yolov5 with detection and segmentation based on yolov5

YOLOv5DS Multi-task yolov5 with detection and segmentation based on yolov5(branch v6.0) decoupled head anchor free segmentation head README中文 Ablation

150 Dec 30, 2022
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

7 Aug 16, 2022