Python-based implementations of algorithms for learning on imbalanced data.

Last update: Dec 13, 2022

Related tags

Machine Learning python data-science machine-learning imbalanced-data notre-dame

Overview

ND DIAL: Imbalanced Algorithms

Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learning algorithms (implemented via TensorFlow). Below is a list of the methods currently implemented.

Undersampling
1. Random Majority Undersampling with/without Replacement
Oversampling
1. SMOTE - Synthetic Minority Over-sampling Technique [1]
2. DAE - Denoising Autoencoder [2] (TensorFlow)
3. GAN - Generative Adversarial Network [3] (TensorFlow)
4. VAE - Variational Autoencoder [4] (TensorFlow)
Ensemble Sampling
1. RAMOBoost [5]
2. RUSBoost [6]
3. SMOTEBoost [7]

References:

[1]	: N. V. Chawla, K. W. Bowyer, L. O. Hall, and P. Kegelmeyer. "SMOTE: Synthetic Minority Over-Sampling Technique." Journal of Artificial Intelligence Research (JAIR), 2002.

[2]	: P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion". Journal of Machine Learning Research (JMLR), 2010.

[3]	: I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. "Generative Adversarial Nets". Advances in Neural Information Processing Systems 27 (NIPS), 2014.

[4]	: D. P. Kingma and M. Welling. "Auto-Encoding Variational Bayes". arXiv preprint arXiv:1312.6114, 2013.

[5]	: S. Chen, H. He, and E. A. Garcia. "RAMOBoost: Ranked Minority Oversampling in Boosting". IEEE Transactions on Neural Networks, 2010.

[6]	: C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano. "RUSBoost: Improving Classification Performance when Training Data is Skewed". International Conference on Pattern Recognition (ICPR), 2008.

[7]	: N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. "SMOTEBoost: Improving Prediction of the Minority Class in Boosting." European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 2003.

Owner

DIAL | Notre Dame

DIAL | Notre Dame

GitHub Repository

🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

1.6k Dec 30, 2022

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

3 Jul 15, 2022

UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base l

254 Dec 31, 2022

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022

Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022

Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

983 Jan 01, 2023

A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

87 Dec 20, 2022

Extreme Learning Machine implementation in Python

Python-ELM v0.3 --- ARCHIVED March 2021 --- This is an implementation of the Extreme Learning Machine [1][2] in Python, based on scikit-learn. From

511 Dec 20, 2022

This is a public repo where code samples are stored for the book Practical MLOps.

[Book-2021] Practical MLOps O'Reilly Book

421 Dec 31, 2022

CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

14 Dec 19, 2021

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning My

3 Apr 10, 2022

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

2 Jan 01, 2022

Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

158 Jan 03, 2023

Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

1.6k Dec 29, 2022

Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

114 Jul 19, 2022

Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

1 Jan 20, 2022

Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

1 Jun 27, 2022

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 07, 2023

An AutoML survey focusing on practical systems.

This project is a community effort in constructing and maintaining an up-to-date beginner-friendly introduction to AutoML, focusing on practical systems. AutoML is a big field, and continues to grow

16 Aug 14, 2022

Datetimes for Humans™

Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems

3.4k Dec 28, 2022