imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanced learning easier for researchers and real-world users. From our experiences, we observe that to tackcle deep imbalanced learning, there is a need for a strategy. That is, we may not just address this problem with one single model or approach. Thus in this package, we seek to provide several strategies for deep imbalanced learning. The package not only implements several popular deep imbalanced learning strategies, but also provides benchmark results on several image classification tasks. Futhermore, this package provides an interface for implementing more datasets and strategies.

Strategy

We provide some baseline strategies as well as some state-of-the-are strategies in this package as the following:

Environments

  • This package is tested on Linux OS.
  • You are suggested to use a different virtual environment so as to avoid package dependency issue.
  • For Pyenv & Virtualenv users, you can follow the below steps to create a new virtual environment or you can also skip this step.
Pyenv & Virtualenv (Optinal)
  • For dependency isolation, it's better to create another virtual environment for usage.
  • The following will be the demo for creating and managing virtual environment.
  • Install pyenv & virtualenv first.
  • pyenv virtualenv [version] [virtualenv_name]
    • For example, if you'd like to use python 3.6.8, you can do: pyenv virtualenv 3.6.8 TestEnv
  • mkdir [dir_name]
  • cd [dir_name]
  • pyenv local [virtualenv_name]
  • Then, you will have a new (clean) python virtual environment for the package installation.

Installation

Basic Requirement

  • Python >= 3.6
git clone https://github.com/ntucllab/imbalanced-DL.git
cd imbalanceddl
python -m pip install -r requirements.txt
python setup.py install

Usage

We highlight three key features of imbalanced-DL as the following:

(0) Imbalanced Dataset:

  • We support 5 benchmark image datasets for deep imbalanced learing.
  • To create and ImbalancedDataset object, you will need to provide a config_file as well as the dataset name you would like to use.
  • Specifically, inside the config_file, you will need to specify three key parameters for creating imbalanced dataset.
    • imb_type: you can choose from exp (long-tailed imbalance) or step imbalanced type.
    • imb_ratio: you can specify the imbalanceness of your data, typically researchers choose 0.1 or 0.01.
    • dataset_name: you can specify 5 benchmark image datasets we provide, or you can implement your own dataset.
    • For an example of the config_file, you can see example/config.
  • To contruct your own dataset, you should inherit from BaseDataset, and you can follow torchvision.datasets.ImageFolder to construct your dataset in PyTorch format.
from imbalanceddl.dataset.imbalance_dataset import ImbalancedDataset

# specify the dataset name
imbalance_dataset = ImbalancedDataset(config, dataset_name=config.dataset)

(1) Strategy Trainer:

  • We support 6 different strategies for deep imbalance learning, and you can either choose to train from scratch, or evaluate with the best model after training. To evaluate with the best model, you can get more in-depth metrics such as per class accuracy for further evaluation on the performance of the selected strategy. We provide one trained model in example/checkpoint_cifar10.
  • For each strategy trainer, it is associated with a config_file, ImbalancedDataset object, model, and strategy_name.
  • Specifically, the config_file will provide some training parameters, where the default settings for reproducing benchmark result can be found in example/config. You can also set these training parameters based on your own need.
  • For model, we currently provide resnet32 and resnet18 for reproducing the benchmark results.
  • We provide a build_trainer() function to return the specified trainer as the following.
from imbalanceddl.strategy.build_trainer import build_trainer

# specify the strategy
trainer = build_trainer(config,
                        imbalance_dataset,
                        model=model,
                        strategy=config.strategy)
# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • Or you can also just select the specific strategy you would like to use as:
from imbalanceddl.strategy import LDAMDRWTrainer

# pick the trainer
trainer = LDAMDRWTrainer(config,
                         imbalance_dataset,
                         model=model,
                         strategy=config.strategy)

# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • To construct your own strategy trainer, you need to inherit from Trainer class, where in your own strategy you will have to implement get_criterion() and train_one_epoch() method. After this you can choose whether to add your strategy to build_trainer() function or you can just use it as the above demonstration.

(2) Benchmark research environment:

  • To conduct deep imbalanced learning research, we provide example codes for training with different strategies, and provide benchmark results on five image datasets. To quickly start training CIFAR-10 with ERM strategy, you can do:
cd example
python main.py --gpu 0 --seed 1126 --c config/config_cifar10.yaml --strategy ERM

  • Following the example code, you can not only get results from baseline training as well as state-of-the-art performance such as LDAM or Remix, but also use this environment to develop your own algorithm / strategy. Feel free to add your own strategy into this package.
  • For more information about example and usage, please see the Example README

Benchmark Results

We provide benchmark results on 5 image datasets, including CIFAR-10, CIFAR-100, CINIC-10, SVHN, and Tiny-ImageNet. We follow standard procedure to generate imbalanced training dataset for these 5 datasets, and provide their top 1 validation accuracy results for research benchmark. For example, below you can see the result table of Long-tailed Imbalanced CIFAR-10 trained on different strategies. For more detailed benchmark results, please see example/README.md.

  • Long-tailed Imbalanced CIFAR-10
imb_type imb_factor Model Strategy Validation Top 1
long-tailed 100 ResNet32 ERM 71.23
long-tailed 100 ResNet32 DRW 75.08
long-tailed 100 ResNet32 LDAM-DRW 77.75
long-tailed 100 ResNet32 Mixup-DRW 82.11
long-tailed 100 ResNet32 Remix-DRW 81.82

Test

  • python -m unittest -v

Contact

If you have any question, please don't hesitate to email [email protected]. Thanks !

Acknowledgement

The authors thank members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.

Owner
NTUCSIE CLLab
Computational Learning Lab, Dept. of Computer Science and Information Engineering, National Taiwan University
NTUCSIE CLLab
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
Myia prototyping

Myia Myia is a new differentiable programming language. It aims to support large scale high performance computations (e.g. linear algebra) and their g

Mila 456 Nov 07, 2022
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Yihui He 1k Jan 03, 2023
State-Relabeling Adversarial Active Learning

State-Relabeling Adversarial Active Learning Code for SRAAL [2020 CVPR Oral] Requirements torch = 1.6.0 numpy = 1.19.1 tqdm = 4.31.1 AL Results The

10 Jul 14, 2022
Flexible time series feature extraction & processing

tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data. Useful

PreDiCT.IDLab 206 Dec 28, 2022
Mscp jamf - Build compliance in jamf

mscp_jamf Build compliance in Jamf. This will build the following xml pieces to

Bob Gendler 3 Jul 25, 2022
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
Neural network chess engine trained on Gary Kasparov's games.

Neural Chess It's not the best chess engine, but it is a chess engine. Proof of concept neural network chess engine (feed-forward multi-layer perceptr

3 Jun 22, 2022
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
Automated detection of anomalous exoplanet transits in light curve data.

Automatically detecting anomalous exoplanet transits This repository contains the source code for the paper "Automatically detecting anomalous exoplan

1 Feb 01, 2022
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

Princeton Natural Language Processing 9 Dec 23, 2022
Few-Shot Object Detection via Association and DIscrimination

Few-Shot Object Detection via Association and DIscrimination Code release of our NeurIPS 2021 paper: Few-Shot Object Detection via Association and DIs

Cao Yuhang 49 Dec 18, 2022
E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

End-to-end Music Remastering System This repository includes source code and pre

Junghyun (Tony) Koo 37 Dec 15, 2022
This repository contains the source codes for the paper AtlasNet V2 - Learning Elementary Structures.

AtlasNet V2 - Learning Elementary Structures This work was build upon Thibault Groueix's AtlasNet and 3D-CODED projects. (you might want to have a loo

Théo Deprelle 123 Nov 11, 2022
Evaluating different engineering tricks that make RL work

Reinforcement Learning Tricks, Index This repository contains the code for the paper "Distilling Reinforcement Learning Tricks for Video Games". Short

Anssi 15 Dec 26, 2022
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

PyMine 144 Dec 30, 2022
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022