We turn our attention to classification. Classification tries to predict, which of a small set of classes, an observation belongs to. Mathematically, the aim is to find
Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes. Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as spam detection, Diabetes prediction, cancer detection etc. Here , we use the Logistic regression model to predict the gender(Male/Female) of the person based on their weight and height . The data set contains three columns
- Height in inches
- Weight in pounds
- Gender (Male/Female) of the person
Let's explore the correlations and see which features separate the Male\Femals populations
From the visualizations avove, we could fairly say that their is a clear correlation between Weight and Height.
The model has some hyperparameters we can tune for hopefully better performance. In order to tune the parameters of the model, a mix of cross-validation and grid search will be used. In Logistic Regression, the most important parameter to tune is the regularization parameter
The regularization parameter
Let's use 2 methods to perform model tuning and selecting the regularization parameter
- Writing our own loops to iterate over the model parameters
- Using GridSearchCV to find the best model
We use the following cv_score function to perform K-fold cross-validation and apply a scoring function to each test fold. In this incarnation we use accuracy score as the default scoring function.
After completing above steps we have conculded that the best regularization parameter C: 1 correspondes to the max validation score: 0.9172
- Basic Logistic Regression (Unregularized): 0.9172
- Tuned Logistic Regression Parameters: {'C': 1}Best score is 0.9168
- Logistic Regression Accuracy Score (Regularized): 0.9252