Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Last update: Jan 02, 2022

Overview

Orbivator_AI

Breast Cancer Wisconsin (Diagnostic)

GOAL

To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

DATASET

https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

DESCRIPTION

Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area.
Hence, we need to classify the dataset into whether the person will be having brest cancer or not.
The goal of this project is to analyse the data and classify whether the person will be having brest cancer ot not and build a model accordingly.

WHAT I HAD DONE

-> Importing the libraries

-> Loaded the dataset

Preprocessing of the dataset:

-> Knowing some of the statistical measures information

-> Visualizing the data

-> Correlation

-> Splitting the dataset

-> Training the data

-> Models used: - Random forest regressor - Logistic regression - Decision Trees

-> Evaluation of the model

-> Predicting the output of new data from the model having the high accuracy

MODELS USED

Random forest regressor:

A random forest regressor. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting

Logistic regression:

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Decision Trees:

Decision Trees are a type of Supervised Machine Learning (that is you explain what the input is and what the corresponding output is in the training data) where the data is continuously split according to a certain parameter.

LIBRARIES NEEDED

pandas
matplotlib
seaborn
sklearn

ACCURACIES

Random forest regressor: 79.44695652173913
Logistic regression: 63.29670329670329
Decision Trees: 89.47368421052632

CONCLUSION

Downloaded the dataset from kaggle, loading the required libraries, Data Pre-Processing, Splitting of data, building the models, testing thier accuracies and finilizing the model based on accuracy.
I have used three models to train the data starting with Random forest regressor, then SLogistic regression and after that Decision Trees. I have finilized the Decision Trees which is having highest accuracy.
Decision Trees is used to determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not with an accuracy over 89%

Anurag kumar Singh Jeesica Pearson Eric Edward Nitin kumar Aditi singh

github:https://github.com/anurag-bit/Orbivator

Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Related tags

Overview

Orbivator_AI

Owner

anurag kumar singh

Official implementation of NeurIPS'2021 paper TransformerFusion

Learn about quantum computing and algorithm on quantum computing

Microscopy Image Cytometry Toolkit

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

A style-based Quantum Generative Adversarial Network

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Convert human motion from video to .bvh

Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

Spectralformer: Rethinking hyperspectral image classification with transformers

Data visualization app for H&M competition in kaggle

MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Official Pytorch Implementation of GraphiT

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

TensorFlow implementation of "Variational Inference with Normalizing Flows"

Training BERT with Compute/Time (Academic) Budget

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper