Machine Learning Algorithms

Overview

Machine-Learning-Algorithms

In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the person's favorite shopping type based on the information provided. In this context, 13 questions were asked to the user. As a result of these questions, the estimation of the shopping type, which is a classification problem, will be carried out with 5 different algorithms.

These algorithms;

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine
  • K Neighbors
  • Decision Tree

algorithms will have a total of 12 parameters

A total of 219 people participated in the survey and the answers given to this form were used in the training of the algorithm.

Target variables to be estimated;

  • Clothing
  • Technology
  • Home/Life
  • Book/Magazine

The questions asked to make the estimation are as follows:

  • Gender
  • Age
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • Which store would you prefer to go to?
  • What is your favorite season?
  • What is the importance of the dollar exchange rate for your shopping?
  • What is your satisfaction level with your budget for shopping?
  • How would you rate your social life?
  • Which of the online shopping sites do you prefer?
  • How often do you go shopping?
  • What is your average sleep time per day?
  • What is your favorite type of shopping? // target

The dataset, which is in the form of a csv file, is read to the system as a dataframe. And the column of information in which hour and minute the user filled out the form, which does not make sense for our algorithm, is removed.

Since the numbers in some columns is way more different than the others before the PCA operation is performed, the standardization process is applied to the columns so that they do not have a greater effect than the combination of these columns during the PCA operation.

The features and target columns to be used during the export of the dataset to the algorithms are determined.

In order to fit the resulting algorithms, the initial state of the dataset, its normalized state and the pca applied states are kept separately. The generated data is divided into parts as train = 0.8 and test = 0.2. Cross Validation process will be applied on 0.8 train data.

Before giving the dataset to the 5 algorithms, the answers written in the text in the dataset and the text in the other questions are encoded and the dataset is converted into numbers.

The 5 algorithms are functions from the sklearn library. The Cross Validation process was performed using the GridSearchCV() function, excluding the Logistic Regression algorithm. In the Logistic regression algorithm, since it is possible to do Cross Validation with the logistic regression function it is not necessary to use GridSearchCV().

GridSearchCV() applies K-Fold Cross Validation by trying the parameters I gave for the function, the number of K for my project is 10. By dividing the cross validation process parameters and the train data we provide, it is determined at which values we can get the best result.

An algorithm is created using the determined parameters and the algorithm is tested with the test data to be fitted with the train data.

Detailed information about dataset can be found in the report.

Owner
Göktuğ Ayar
Computer Engineering student at Yildiz Technical University
Göktuğ Ayar
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
Machine-care - A simple python script to take care of simple maintenance tasks

Machine care An simple python script to take care of simple maintenance tasks fo

2 Jul 10, 2022
Land Cover Classification Random Forest

You can perform Land Cover Classification on Satellite Images using Random Forest and visualize the result using Earthpy package. Make sure to install the required packages and such as

Dr. Sander Ali Khowaja 1 Jan 21, 2022
Made in collaboration with Chris George for Art + ML Spring 2019.

Deepdream Eyes Made in collaboration with Chris George for Art + ML Spring 2019.

Francisco Cabrera 1 Jan 12, 2022
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 02, 2023
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Intel(R) Extension for Scikit-learn* Installation | Documentation | Examples | Support | FAQ With Intel(R) Extension for Scikit-learn you can accelera

Intel Corporation 858 Dec 25, 2022
Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

Aayush Khandekar 2 Oct 25, 2021
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
Machine Learning University: Accelerated Natural Language Processing Class

Machine Learning University: Accelerated Natural Language Processing Class This repository contains slides, notebooks and datasets for the Machine Lea

AWS Samples 2k Jan 01, 2023
PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors.

PyNNDescent PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors. It provides a python implementation of Nearest Neighbo

Leland McInnes 699 Jan 09, 2023
MLFlow in a Dockercontainer based on Azurite and Postgres

mlflow-azurite-postgres docker This is a MLFLow image which works with a postgres DB and a local Azure Blob Storage Instance (Azurite). This image is

2 May 29, 2022
30 Days Of Machine Learning Using Pytorch

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Mayur 119 Nov 24, 2022
Predict profitability of trades based on indicator buy / sell signals

Predict profitability of trades based on indicator buy / sell signals Trade profitability analysis for trades based on various indicators signals: MAC

Tomasz Porzycki 1 Dec 15, 2021
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
Machine-Learning with python (jupyter)

Machine-Learning with python (jupyter) 머신러닝 야학 작심 10일과 쥬피터 노트북 기반 데이터 사이언스 시작 들어가기전 https://nbviewer.org/ 페이지를 통해서 쥬피터 노트북 내용을 볼 수 있다. 위 페이지에서 현재 레포 기

HyeonWoo Jeong 1 Jan 23, 2022