Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Last update: Oct 16, 2022

Overview

Gender-classification

This is a ML model to classify Male and Females using some physical characterstics Data. Python Libraries like Pandas,Numpy and Sklearn are used In this.

Data set credits: Kaggle.com

1. Importing Libraries

import pandas as pd
import sklearn
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import VotingClassifier,RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

2. Loading Data and exploring data

data = pd.read_csv("L:\Gender classification\gender_classification_v7.csv")
data.head(20)
#checking for null values
data.isnull().sum()
data.describe()

Visualizing physical characters & diffrences using Graphs and plots

#visualising forehead length data
sns.lineplot(data['forehead_width_cm'],data['forehead_height_cm'], hue=data["gender"])

#visualising nose length data
sns.lineplot(data['nose_long'],data['nose_wide'], hue=data["gender"])

3. Encoding data and splitting data

twogender = {'Female':0, 'Male':1}
data['gender'] = data['gender'].map(twogender)

X = data.drop('gender', axis=1)
y = data['gender']

#splitting data for testing and traing process
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)
print(X_train.shape)
print(X_val.shape)
print(y_train.shape)
print(y_val.shape)

Now we will test diffrent Sklearn Models to find best accuracy

4. Importing All required prerequisites

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, plot_confusion_matrix
from sklearn.ensemble import RandomForestClassifier

5. Decision tree-classifier

dt = DecisionTreeClassifier(random_state=0)

dt.fit(X_train, y_train)
dt_pred = dt.predict(X_val)
dt_acc = accuracy_score(y_val, dt_pred)
print('Accuracy of Decision Tree is: {:.2f}%'.format(dt_acc*100))

6. RandomforestClassifier

rf = RandomForestClassifier(random_state=0)

rf.fit(X_train, y_train)
rf_pred = rf.predict(X_val)
rf_acc = accuracy_score(y_val, rf_pred)
print('Accuracy of Random Forest is: {:.2f}%'.format(rf_acc*100))

7. Logistic regression

lr = LogisticRegression(random_state=0)

lr.fit(X_train, y_train)
lr_pred = lr.predict(X_val)
lr_acc = accuracy_score(y_val, lr_pred)
print('Accuracy of Logistic Regression is: {:.2f}%'.format(lr_acc*100))

8. K-nearest neighbour

knn = KNeighborsClassifier()
params = {'n_neighbors':[2,3,4,5,6,7,8,9]}

model = GridSearchCV(knn, params, cv=5)
model.fit(X_train, y_train)
model.best_params_

kn = KNeighborsClassifier(n_neighbors=8)

kn.fit(X_train, y_train)
kn_pred = kn.predict(X_val)
kn_acc = accuracy_score(y_val, kn_pred)
print('Accuracy of KNeighbors is: {:.2f}%'.format(kn_acc*100))

RESULTS

1. Accuracy of Decision Tree is: 96.87%

It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.

2. Accuracy of Random Forest is: 97.53%

Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.

3. Accuracy of Logistic Regression is: 97.27%

Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables

4. Accuracy of KNeighbors is: 97.20%

K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categorie K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.

Deployment process(in-complete)

File index.html(interface for deployment of webapp)

import pandas as pd
import sklearn
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import VotingClassifier,RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix

data = pd.read_csv("L:\Gender classification\gender_classification_v7.csv")

#encoding data
twogender = {'Female':0, 'Male':1}
data['gender'] = data['gender'].map(twogender)

X = data.drop('gender', axis=1)
y = data['gender']

#splitting data for testing and traing process
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, plot_confusion_matrix


X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)

from sklearn.ensemble import RandomForestClassifier
#randomforestClassifier 
rf = RandomForestClassifier(random_state=0)

rf.fit(X_train, y_train)
rf_pred = rf.predict(X_val)
rf_acc = accuracy_score(y_val, rf_pred)
print('Accuracy of Random Forest is: {:.2f}%'.format(rf_acc*100))

#pickeling model
import pickle
pickle.dump(rf,open("model.pkl","wb"))

model =pickle.load(open("model.pkl","rb"))

this creates a model.pkl file and stores model

Contribution(s)

Contributions are always welcome! You can contribute to this project in the following way:

Deployment of model
Accuracy improvement
Bug fixes

Author

Aryan Raj

by Aryan Raj

Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Related tags

Overview

Gender-classification

Visualizing physical characters & diffrences using Graphs and plots

RESULTS

Deployment process(in-complete)

Contribution(s)

Author

Owner

Aryan raj

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Pytorch implementation of DeePSiM

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Heat transfer problemas solved using python

A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

Pytorch implementation of MalConv

Contains source code for the winning solution of the xView3 challenge

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021

Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Related tags

Overview

Gender-classification

Visualizing physical characters & diffrences using Graphs and plots

RESULTS

Deployment process(in-complete)

Contribution(s)

Author

Owner

Aryan raj

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Pytorch implementation of DeePSiM

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Heat transfer problemas solved using python

A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

Pytorch implementation of MalConv

Contains source code for the winning solution of the xView3 challenge

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch