imbalanced-DL: Deep Imbalanced Learning in Python

Last update: Dec 28, 2022

Related tags

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanced learning easier for researchers and real-world users. From our experiences, we observe that to tackcle deep imbalanced learning, there is a need for a strategy. That is, we may not just address this problem with one single model or approach. Thus in this package, we seek to provide several strategies for deep imbalanced learning. The package not only implements several popular deep imbalanced learning strategies, but also provides benchmark results on several image classification tasks. Futhermore, this package provides an interface for implementing more datasets and strategies.

Strategy

We provide some baseline strategies as well as some state-of-the-are strategies in this package as the following:

Empirical Risk Minimization (baseline strategy)
Reweighting with Class Balance (CB) Loss
Deferred Re-Weighting (DRW)
Label Distribution Aware Margin (LDAM) Loss with DRW
Mixup with DRW
Remix with DRW

Environments

This package is tested on Linux OS.
You are suggested to use a different virtual environment so as to avoid package dependency issue.
For Pyenv & Virtualenv users, you can follow the below steps to create a new virtual environment or you can also skip this step.

Pyenv & Virtualenv (Optinal)

For dependency isolation, it's better to create another virtual environment for usage.
The following will be the demo for creating and managing virtual environment.
Install pyenv & virtualenv first.
pyenv virtualenv [version] [virtualenv_name]
- For example, if you'd like to use python 3.6.8, you can do: pyenv virtualenv 3.6.8 TestEnv
mkdir [dir_name]
cd [dir_name]
pyenv local [virtualenv_name]
Then, you will have a new (clean) python virtual environment for the package installation.

Installation

Basic Requirement

Python >= 3.6

git clone https://github.com/ntucllab/imbalanced-DL.git
cd imbalanceddl
python -m pip install -r requirements.txt
python setup.py install

Usage

We highlight three key features of imbalanced-DL as the following:

(0) Imbalanced Dataset:

We support 5 benchmark image datasets for deep imbalanced learing.
To create and ImbalancedDataset object, you will need to provide a config_file as well as the dataset name you would like to use.
Specifically, inside the config_file, you will need to specify three key parameters for creating imbalanced dataset.
- imb_type: you can choose from exp (long-tailed imbalance) or step imbalanced type.
- imb_ratio: you can specify the imbalanceness of your data, typically researchers choose 0.1 or 0.01.
- dataset_name: you can specify 5 benchmark image datasets we provide, or you can implement your own dataset.
- For an example of the config_file, you can see example/config.
To contruct your own dataset, you should inherit from BaseDataset, and you can follow torchvision.datasets.ImageFolder to construct your dataset in PyTorch format.

from imbalanceddl.dataset.imbalance_dataset import ImbalancedDataset

# specify the dataset name
imbalance_dataset = ImbalancedDataset(config, dataset_name=config.dataset)

(1) Strategy Trainer:

We support 6 different strategies for deep imbalance learning, and you can either choose to train from scratch, or evaluate with the best model after training. To evaluate with the best model, you can get more in-depth metrics such as per class accuracy for further evaluation on the performance of the selected strategy. We provide one trained model in example/checkpoint_cifar10.
For each strategy trainer, it is associated with a config_file, ImbalancedDataset object, model, and strategy_name.
Specifically, the config_file will provide some training parameters, where the default settings for reproducing benchmark result can be found in example/config. You can also set these training parameters based on your own need.
For model, we currently provide resnet32 and resnet18 for reproducing the benchmark results.
We provide a build_trainer() function to return the specified trainer as the following.

from imbalanceddl.strategy.build_trainer import build_trainer

# specify the strategy
trainer = build_trainer(config,
                        imbalance_dataset,
                        model=model,
                        strategy=config.strategy)
# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()

Or you can also just select the specific strategy you would like to use as:

from imbalanceddl.strategy import LDAMDRWTrainer

# pick the trainer
trainer = LDAMDRWTrainer(config,
                         imbalance_dataset,
                         model=model,
                         strategy=config.strategy)

# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()

To construct your own strategy trainer, you need to inherit from Trainer class, where in your own strategy you will have to implement get_criterion() and train_one_epoch() method. After this you can choose whether to add your strategy to build_trainer() function or you can just use it as the above demonstration.

(2) Benchmark research environment:

To conduct deep imbalanced learning research, we provide example codes for training with different strategies, and provide benchmark results on five image datasets. To quickly start training CIFAR-10 with ERM strategy, you can do:

cd example
python main.py --gpu 0 --seed 1126 --c config/config_cifar10.yaml --strategy ERM

Following the example code, you can not only get results from baseline training as well as state-of-the-art performance such as LDAM or Remix, but also use this environment to develop your own algorithm / strategy. Feel free to add your own strategy into this package.
For more information about example and usage, please see the Example README

Benchmark Results

We provide benchmark results on 5 image datasets, including CIFAR-10, CIFAR-100, CINIC-10, SVHN, and Tiny-ImageNet. We follow standard procedure to generate imbalanced training dataset for these 5 datasets, and provide their top 1 validation accuracy results for research benchmark. For example, below you can see the result table of Long-tailed Imbalanced CIFAR-10 trained on different strategies. For more detailed benchmark results, please see example/README.md.

Long-tailed Imbalanced CIFAR-10

`imb_type`	`imb_factor`	Model	Strategy	Validation Top 1
long-tailed	100	ResNet32	ERM	71.23
long-tailed	100	ResNet32	DRW	75.08
long-tailed	100	ResNet32	LDAM-DRW	77.75
long-tailed	100	ResNet32	Mixup-DRW	82.11
long-tailed	100	ResNet32	Remix-DRW	81.82

Test

python -m unittest -v

Contact

If you have any question, please don't hesitate to email [email protected]. Thanks !

Acknowledgement

The authors thank members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.

imbalanced-DL: Deep Imbalanced Learning in Python

Related tags

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

Strategy

Environments

Pyenv & Virtualenv (Optinal)

Installation

Basic Requirement

Usage

Benchmark Results

Test

Contact

Acknowledgement

Owner

NTUCSIE CLLab

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

Block Sparse movement pruning

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

This repo contains the code required to train the multivariate time-series Transformer.

Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

✂️ EyeLipCropper is a Python tool to crop eyes and mouth ROIs of the given video.

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

AI Toolkit for Healthcare Imaging

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

Self-Adaptable Point Processes with Nonparametric Time Decays

imbalanced-DL: Deep Imbalanced Learning in Python

Related tags

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

Strategy

Environments

Pyenv & Virtualenv (Optinal)

Installation

Basic Requirement

Usage

Benchmark Results

Test

Contact

Acknowledgement

Owner

NTUCSIE CLLab

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

Block Sparse movement pruning

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

This repo contains the code required to train the multivariate time-series Transformer.

Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

✂️ EyeLipCropper is a Python tool to crop eyes and mouth ROIs of the given video.

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

AI Toolkit for Healthcare Imaging

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

FinRL­-Meta: A Universe for Data­-Driven Financial Reinforcement Learning. 🔥

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

Self-Adaptable Point Processes with Nonparametric Time Decays

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥