A framework for feature exploration in Data Science

Related tags

ScienceBeehive
Overview

Beehive

A framework for feature exploration in Data Science

Background

What do we do when we finish one episode of feature exploration in a jupyter notebook?
We overwrite the notebook or clone the notebook for the next episode of feature exploration.

I say this is inefficient, thus this project.

Purpose

To help documentation for grouping all episode of feature exploration in a single notebook.

Project Elaboration

terms

episode: one set of feature trained on a model.
e.g.
episode_1 = (sepal_length, sepal_width) | (target_class)
episode_2 = (sepal_length, sepal_width, petal_length, petal_width) | (target_class)
episode_3 = (sepal_length_normalized, sepal_width_normalized, petal_length_normalized, petal_width_normalized) | (target_class)

get to know the classes

Combee

One Combee should represent one model and its features includes: (metrics, confusion matrix, feature importances)
(sklearn-based)

CombeKeras

One Combee should represent one model and its features includes: (metrics, confusion matrix)
(keras-based)
needed because Keras has different features and functions compared to sklearn

Vespiqueen

One Vespiqueen should represent one episode.
Vespiqueen memorizes the tranformation of data.
Vespiqueen governs all the Combees.
Can be used to pick which model is best in a Vespiqueen.

Beehive

Beehive governs the whole Vespiqueen.
Beehive is used to print all the metrics across Vespiqueen (or episodes)
Can also be used to pick new features from the last or best vespiqueen for the next episode.

How to Use

Owner
Steven IJ
Steven IJ
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 01, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Dec 31, 2022
Zipline, a Pythonic Algorithmic Trading Library

Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backte

Quantopian, Inc. 15.7k Jan 07, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 05, 2023
Discontinuous Galerkin finite element method (DGFEM) for Maxwell Equations

DGFEM Maxwell Equations Discontinuous Galerkin finite element method (DGFEM) for Maxwell Equations. Work in progress. Currently, the 1D Maxwell equati

Rafael de la Fuente 9 Aug 16, 2022
Open Delmic Microscope Software

Odemis Odemis (Open Delmic Microscope Software) is the open-source microscopy software of Delmic B.V. Odemis is used for controlling microscopes of De

Delmic 32 Dec 14, 2022
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Dec 30, 2022
AnuGA for the simulation of the shallow water equation

ANUGA Contents ANUGA What is ANUGA? Installation Documentation and Help Mailing Lists Web sites Latest source code Bug reports Developer information L

Geoscience Australia 147 Dec 14, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Book on Julia for Data Science

Book on Julia for Data Science

Julia Data Science 349 Dec 25, 2022
PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

PsychoPy 1.3k Dec 31, 2022
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer

Jeppe Dakin 62 Dec 08, 2022
Float2Binary - A simple python class which finds the binary representation of a floating-point number.

Float2Binary A simple python class which finds the binary representation of a floating-point number. You can find a class in IEEE754.py file with the

Bora Canbula 3 Dec 14, 2021
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 915 Dec 29, 2022
artisan: visual scope for coffee roasters

Artisan Visual scope for coffee roasters WARNING: pre-release builds may not work. Use at your own risk. Summary Artisan is a software that helps coff

Artisan – Visual Scope for Coffee Roasters 705 Jan 05, 2023
SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications.

Scientific Computational Imaging COde (SCICO) SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications

Los Alamos National Laboratory 37 Dec 21, 2022
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
A simple computer program made with Python on the brachistochrone curve.

Brachistochrone-curve This is a simple computer program made with Python on the brachistochrone curve. I decided to write it after a physics lesson on

Diego Romeo 1 Dec 16, 2021
PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

PennyLaneAI 1.6k Jan 04, 2023