A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Code for the TCAV ML interpretability project

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

Simplify stop motion animation with machine learning.

A collection of neat and practical data science and machine learning projects

A toolkit for geo ML data processing and model evaluation (fork of solaris)

Simple Machine Learning Tool Kit

Lightweight Machine Learning Experiment Logging 📖

MICOM is a Python package for metabolic modeling of microbial communities

A unified framework for machine learning with time series

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

A collection of machine learning examples and tutorials.

Primitives for machine learning and data science.

Predict profitability of trades based on indicator buy / sell signals

ML-powered Loan-Marketer Customer Filtering Engine

Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Simple structured learning framework for python