Machine Learning Algorithms

Overview

Machine-Learning-Algorithms

In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the person's favorite shopping type based on the information provided. In this context, 13 questions were asked to the user. As a result of these questions, the estimation of the shopping type, which is a classification problem, will be carried out with 5 different algorithms.

These algorithms;

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine
  • K Neighbors
  • Decision Tree

algorithms will have a total of 12 parameters

A total of 219 people participated in the survey and the answers given to this form were used in the training of the algorithm.

Target variables to be estimated;

  • Clothing
  • Technology
  • Home/Life
  • Book/Magazine

The questions asked to make the estimation are as follows:

  • Gender
  • Age
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • What is your favorite season?
  • What is the importance of the dollar exchange rate for your shopping?
  • What is your satisfaction level with your budget for shopping?
  • How would you rate your social life?
  • Which of the online shopping sites do you prefer?
  • How often do you go shopping?
  • What is your average sleep time per day?
  • What is your favorite type of shopping? // target

The dataset, which is in the form of a csv file, is read to the system as a dataframe. And the column of information in which hour and minute the user filled out the form, which does not make sense for our algorithm, is removed.

Since the numbers in some columns is way more different than the others before the PCA operation is performed, the standardization process is applied to the columns so that they do not have a greater effect than the combination of these columns during the PCA operation.

The features and target columns to be used during the export of the dataset to the algorithms are determined.

In order to fit the resulting algorithms, the initial state of the dataset, its normalized state and the pca applied states are kept separately. The generated data is divided into parts as train = 0.8 and test = 0.2. Cross Validation process will be applied on 0.8 train data.

Before giving the dataset to the 5 algorithms, the answers written in the text in the dataset and the text in the other questions are encoded and the dataset is converted into numbers.

The 5 algorithms are functions from the sklearn library. The Cross Validation process was performed using the GridSearchCV() function, excluding the Logistic Regression algorithm. In the Logistic regression algorithm, since it is possible to do Cross Validation with the logistic regression function it is not necessary to use GridSearchCV().

GridSearchCV() applies K-Fold Cross Validation by trying the parameters I gave for the function, the number of K for my project is 10. By dividing the cross validation process parameters and the train data we provide, it is determined at which values we can get the best result.

An algorithm is created using the determined parameters and the algorithm is tested with the test data to be fitted with the train data.

Detailed information about dataset can be found in the report.

Owner
Göktuğ Ayar
Computer Engineering student at Yildiz Technical University
Göktuğ Ayar
A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

Victor Umunna 7 Oct 13, 2022
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

hexhamming What does it do? This module performs a fast bitwise hamming distance of two hexadecimal strings. This looks like: DEADBEEF = 1101111010101

Michael Recachinas 12 Oct 14, 2022
GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
Add built-in support for quaternions to numpy

Quaternions in numpy This Python module adds a quaternion dtype to NumPy. The code was originally based on code by Martin Ling (which he wrote with he

Mike Boyle 531 Dec 28, 2022
A Python step-by-step primer for Machine Learning and Optimization

early-ML Presentation General Machine Learning tutorials A Python step-by-step primer for Machine Learning and Optimization This github repository gat

Dimitri Bettebghor 8 Dec 01, 2022
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 35 Jan 02, 2023
Deploy AutoML as a service using Flask

AutoML Service Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving. The framework imp

Chris Rawles 221 Nov 04, 2022
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Bytedance Inc. 3.3k Dec 28, 2022
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021
InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

Alex Rogozhnikov 183 Jan 03, 2023
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

Sriram Govindan 14 Dec 19, 2021
Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

Puesta en Producción de un modelo de aprendizaje automático con Flask y Heroku L

Jesùs Guillen 1 Jun 03, 2022
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi

Kluger Lab 547 Dec 21, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

Krishna Priyatham Potluri 73 Dec 01, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022