A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

A naive Bayes model for cancer classification using a set of documents

A machine learning model for Covid case prediction

A data preprocessing package for time series data. Design for machine learning and deep learning.

Machine Learning from Scratch

Simple Machine Learning Tool Kit

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

Tools for mathematical optimization region

Stats, linear algebra and einops for xarray

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

Extended Isolation Forest for Anomaly Detection

Customers Segmentation with RFM Scores and K-means

Implemented four supervised learning Machine Learning algorithms

Scikit-Learn useful pre-defined Pipelines Hub

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

Official code for HH-VAEM

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

Adaptive: parallel active learning of mathematical functions