DuBE: Duple-balanced Ensemble Learning from Skewed Data

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

DuBE is an ensemble learning framework for (multi)class-imbalanced classification. It is an easy-to-use solution to imbalanced learning problems, features good performance, computing efficiency, and wide compatibility with different learning models. Documentation and examples are available at https://duplebalance.readthedocs.io.

Table of Contents

Background

Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance.

image

Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE, a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient.

image

Install

Our DuBE implementation requires following dependencies:

You can install DuBE by clone this repository:

git clone https://github.com/ICDE2022Sub/duplebalance.git
cd duplebalance
pip install .

Usage

For more detailed usage example, please see Examples.

A minimal working example:

# load dataset & prepare environment
from duplebalance import DupleBalanceClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_classes=3,
                           n_informative=4, weights=[0.2, 0.3, 0.5],
                           random_state=0)

# ensemble training
clf = DupleBalanceClassifier(
    n_estimators=10,
    random_state=42,
    ).fit(X_train, y_train)

# predict
y_pred_test = clf.predict_proba(X_test)

Documentation

For more detailed API references, please see API reference.

Our DupleBalance implementation can be used much in the same way as the ensemble classifiers in sklearn.ensemble. The DupleBalanceClassifier class inherits from the sklearn.ensemble.BaseEnsemble base class.

Main parameters are listed below:

Parameters Description
base_estimator object, optional (default=sklearn.tree.DecisionTreeClassifier())
The base estimator to fit on self-paced under-sampled subsets of the dataset. NO need to support sample weighting. Built-in fit(), predict(), predict_proba() methods are required.
n_estimators int, optional (default=10)
The number of base estimators in the ensemble.
resampling_target {'hybrid', 'under', 'over', 'raw'}, default="hybrid"
Determine the number of instances to be sampled from each class (inter-class balancing).
- If under, perform under-sampling. The class containing the fewest samples is considered the minority class :math:c_{min}. All other classes are then under-sampled until they are of the same size as :math:c_{min}.
- If over, perform over-sampling. The class containing the argest number of samples is considered the majority class :math:c_{maj}. All other classes are then over-sampled until they are of the same size as :math:c_{maj}.
- If hybrid, perform hybrid-sampling. All classes are under/over-sampled to the average number of instances from each class.
- If raw, keep the original size of all classes when resampling.
resampling_strategy {'hem', 'shem', 'uniform'}, default="shem")
Decide how to assign resampling probabilities to instances during ensemble training (intra-class balancing).
- If hem, perform hard-example mining. Assign probability with respect to instance's latest prediction error.
- If shem, perform soft hard-example mining. Assign probability by inversing the classification error density.
- If uniform, assign uniform probability, i.e., random resampling.
perturb_alpha float or str, optional (default='auto')
The multiplier of the calibrated Gaussian noise that was add on the sampled data. It determines the intensity of the perturbation-based augmentation. If 'auto', perturb_alpha will be automatically tuned using a subset of the given training data.
k_bins int, optional (default=5)
The number of error bins that were used to approximate error distribution. It is recommended to set it to 5. One can try a larger value when the smallest class in the data set has a sufficient number (say, > 1000) of samples.
estimator_params list of str, optional (default=tuple())
The list of attributes to use as parameters when instantiating a new base estimator. If none are given, default parameters are used.
n_jobs int, optional (default=None)
The number of jobs to run in parallel for :meth:predict. None means 1 unless in a :obj:joblib.parallel_backend context. -1 means using all processors. See :term:Glossary <n_jobs> for more details.
random_state int / RandomState instance / None, optional (default=None)
If integer, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by numpy.random.
verbose int, optional (default=0)
Controls the verbosity when fitting and predicting.
Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

2 Feb 03, 2022
Ros2-voiceroid2 - ROS2 wrapper package of VOICEROID2

ros2_voiceroid2 ROS2 wrapper package of VOICEROID2 Windows Only Installation Ins

Nkyoku 1 Jan 23, 2022
Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

Mark Gatere 4 Mar 30, 2022
PiRank: Learning to Rank via Differentiable Sorting

PiRank: Learning to Rank via Differentiable Sorting This repository provides a reference implementation for learning PiRank-based models as described

54 Dec 17, 2022
Social Fabric: Tubelet Compositions for Video Relation Detection

Social-Fabric Social Fabric: Tubelet Compositions for Video Relation Detection This repository contains the code and results for the following paper:

Shuo Chen 7 Aug 09, 2022
利用Tensorflow实现基于CNN的中文短文本分类

Text Classification with CNN 使用卷积神经网络进行中文文本分类 CNN做句子分类的论文可以参看: Convolutional Neural Networks for Sentence Classification 还可以去读dennybritz大牛的博客:Implemen

Jeremiah 4 Nov 08, 2022
The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

The accompanying codes reproduce all figures and statistics presented in "Controlling for multiple covariates" by Mark Tygert. This repository also pr

Meta Research 1 Dec 02, 2021
BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Book My Show PC What is this? Movie Ticket Reservation App made with Tkinter. Tk

The Nithin Balaji 3 Dec 09, 2022
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

DingDing 143 Jan 01, 2023
Locally Constrained Self-Attentive Sequential Recommendation

LOCKER This is the pytorch implementation of this paper: Locally Constrained Self-Attentive Sequential Recommendation. Zhankui He, Handong Zhao, Zhe L

Zhankui (Aaron) He 8 Jul 30, 2022
Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Layerwise Anomaly This repository contains the source code and data for our ACL 2021 paper: "How is BERT surprised? Layerwise detection of linguistic

6 Dec 07, 2022
Hierarchical User Intent Graph Network for Multimedia Recommendation

Hierarchical User Intent Graph Network for Multimedia Recommendation This is our Pytorch implementation for the paper: Hierarchical User Intent Graph

6 Jan 05, 2023
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
Paper Code:A Self-adaptive Weighted Differential Evolution Approach for Large-scale Feature Selection

1. SaWDE.m is the main function 2. DataPartition.m is used to randomly partition the original data into training sets and test sets with a ratio of 7

wangxb 14 Dec 08, 2022
[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (CVPR 2021) arXiv Prerequisite PyTorch = 1.2.0 Python3 torchvision PIL argpar

51 Nov 11, 2022
Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

Please read the blog post that goes with this code! Jupyter Notebook Setup System Requirements: Python, pip (Optional) virtualenv To start the Jupyter

Denny Britz 863 Dec 15, 2022
This repo holds the code of TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation

TransFuse This repo holds the code of TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation Requirements Pytorch=1.6.0, 1.9.0 (=1.

Rayicer 93 Dec 19, 2022
Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

TGIN Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction". Files in the folder dataset/ electr

Alibaba 21 Dec 21, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
AgML is a comprehensive library for agricultural machine learning

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks.

Plant AI and Biophysics Lab 1 Jul 07, 2022