Awesome Long-Tailed Learning

A curated list of awesome deep long-tailed learning resources. We recently released Deep Long-Tailed Learning: A Survey to the community. In this survey, we reviewed recent advances in long-tailed learning based on deep neural networks.

Specifically, existing long-tailed learning studies can be grouped into three main categories (i.e., class re-balancing, information augmentation and module improvement), which can be further classified into nine sub-categories (as shown in the below figure). We also empirically analyzed several state-of-the-art methods by evaluating to what extent they address the issue of class imbalance. We concluded the survey by highlighting important applications of deep long-tailed learning and identifying several promising directions for future research. After completing this survey, we decided to release the collected long-tailed learning resources, hoping to push the development of the community. If you have any questions or suggestions, please feel free to contact us.

1. Type of Long-tailed Learning

Symbol	`Sampling`	`CSL`	`LA`	`TL`	`Aug`
Type	Re-sampling	Cost-sensitive Learning	Logit Adjustment	Transfer Learning	Data Augmentation

Symbol	`RL`	`CD`	`DT`	`Ensemble`	`other`
Type	Representation Learning	Classifier Design	Decoupled Training	Ensemble Learning	Other Types

2. Top-tier Conference Papers

2021

Title	Venue	Year	Type	Code
Improving contrastive learning on imbalanced seed data via open-world sampling	NeurIPS	2021	`Sampling`,`TL`, `DC`	Official
Semi-supervised semantic segmentation via adaptive equalization learning	NeurIPS	2021	`Sampling`,`CSL`,`TL`, `Aug`	Official
On model calibration for long-tailed object detection and instance segmentation	NeurIPS	2021	`LA`	Official
Label-imbalanced and group-sensitive classification under overparameterization	NeurIPS	2021	`LA`
Towards calibrated model for long-tailed visual recognition from prior perspective	NeurIPS	2021	`Aug`, `RL`	Official
Supercharging imbalanced data learning with energy-based contrastive representation transfer	NeurIPS	2021	`Aug`, `TL`, `RL`	Official
VideoLT: Large-scale long-tailed video recognition	ICCV	2021	`Sampling`	Official
Exploring classification equilibrium in long-tailed object detection	ICCV	2021	`Sampling`,`CSL`	Official
GistNet: a geometric structure transfer network for long-tailed recognition	ICCV	2021	`Sampling`,`TL`, `DC`
FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation	ICCV	2021	`Sampling`,`CSL`
ACE: Ally complementary experts for solving long-tailed recognition in one-shot	ICCV	2021	`Sampling`,`Ensemble`	Official
Influence-Balanced Loss for Imbalanced Visual Classification	ICCV	2021	`CSL`	Official
Re-distributing biased pseudo labels for semi-supervised semantic segmentation: A baseline investigation	ICCV	2021	`TL`	Official
Self supervision to distillation for long-tailed visual recognition	ICCV	2021	`TL`	Official
Distilling virtual examples for long-tailed recognition	ICCV	2021	`TL`
MosaicOS: A simple and effective use of object-centric images for long-tailed object detection	ICCV	2021	`TL`	Official
Parametric contrastive learning	ICCV	2021	`RL`	Official
Distributional robustness loss for long-tail learning	ICCV	2021	`RL`	Official
Learning of visual relations: The devil is in the tails	ICCV	2021	`DT`
Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection	ICML	2021	`Sampling`	Official
Delving into deep imbalanced regression	ICML	2021	`Other`	Official
Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings	CVPR	2021	`Sampling`,`Ensemble`
Equalization loss v2: A new gradient balance approach for long-tailed object detection	CVPR	2021	`CSL`	Official
Seesaw loss for long-tailed instance segmentation	CVPR	2021	`CSL`	Official
Adaptive class suppression loss for long-tail object detection	CVPR	2021	`CSL`	Official
PML: Progressive margin loss for long-tailed age classification	CVPR	2021	`CSL`
Disentangling label distribution for long-tailed visual recognition	CVPR	2021	`CSL`,`LA`	Official
Adversarial robustness under long-tailed distribution	CVPR	2021	`CSL`,`LA`,`CD`	Official
Distribution alignment: A unified framework for long-tail visual recognition	CVPR	2021	`CSL`,`LA`,`DT`	Official
Improving calibration for long-tailed recognition	CVPR	2021	`CSL`,`Aug`,`DT`	Official
CReST: A classrebalancing self-training framework for imbalanced semi-supervised learning	CVPR	2021	`TL`	Official
Conceptual 12M: Pushing web-scale image-text pre-training to recognize long-tail visual concepts	CVPR	2021	`TL`	Official
RSG: A simple but effective module for learning imbalanced datasets	CVPR	2021	`TL`,`Aug`	Official
MetaSAug: Meta semantic augmentation for long-tailed visual recognition	CVPR	2021	`Aug`	Official
Contrastive learning based hybrid networks for long-tailed image classification	CVPR	2021	`RL`
Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision	CVPR	2021	`RL`
Long-tail learning via logit adjustment	ICLR	2021	`LA`	Official
Long-tailed recognition by routing diverse distribution-aware experts	ICLR	2021	`TL`,`Ensemble`	Official
Exploring balanced feature spaces for representation learning	ICLR	2021	`RL`,`DT`

2020

Title	Venue	Year	Type	Code
Balanced meta-softmax for long-taield visual recognition	NeurIPS	2020	`Sampling`,`CSL`	Official
Posterior recalibration for imbalanced datasets	NeurIPS	2020	`LA`	Official
Long-tailed classification by keeping the good and removing the bad momentum causal effect	NeurIPS	2020	`LA`,`CD`	Official
Rethinking the value of labels for improving classimbalanced learning	NeurIPS	2020	`TL`,`RA`	Official
The devil is in classification: A simple framework for long-tail instance segmentation	ECCV	2020	`Sampling`,`DT`,`Ensemble`	Official
Imbalanced continual learning with partitioning reservoir sampling	ECCV	2020	`Sampling`	Official
Distribution-balanced loss for multi-label classification in long-tailed datasets	ECCV	2020	`CSL`	Official
Feature space augmentation for long-tailed data	ECCV	2020	`TL`,`Aug`,`DT`
Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification	ECCV	2020	`TL`,`Ensemble`	Official
Solving long-tailed recognition with deep realistic taxonomic classifier	ECCV	2020	`CD`	Official
Learning to segment the tail	CVPR	2020	`Sampling`,`TL`	Official
BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition	CVPR	2020	`Sampling`,`Ensemble`	Official
Overcoming classifier imbalance for long-tail object detection with balanced group softmax	CVPR	2020	`Sampling`,`Ensemble`	Official
Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective	CVPR	2020	`CSL`	Official
Equalization loss for long-tailed object recognition	CVPR	2020	`CSL`	Official
Domain balancing: Face recognition on long-tailed domains	CVPR	2020	`CSL`
M2m: Imbalanced classification via majorto-minor translation	CVPR	2020	`TL`,`Aug`	Official
Deep representation learning on long-tailed data: A learnable embedding augmentation perspective	CVPR	2020	`TL`,`Aug`,`RL`
Inflated episodic memory with region self-attention for long-tailed visual recognition	CVPR	2020	`RL`
Decoupling representation and classifier for long-tailed recognition	ICLR	2020	`Sampling`,`CSL`,`RL`,`CD`,`DT`	Official

2019

Title	Venue	Year	Type	Code
Meta-weight-net: Learning an explicit mapping for sample weighting	NeurIPS	2019	`CSL`	Official
Learning imbalanced datasets with label-distribution-aware margin loss	NeurIPS	2019	`CSL`	Official
Dynamic curriculum learning for imbalanced data classification	ICCV	2019	`Sampling`
Class-balanced loss based on effective number of samples	CVPR	2019	`CSL`	Official
Striking the right balance with uncertainty	CVPR	2019	`CSL`
Feature transfer learning for face recognition with under-represented data	CVPR	2019	`TL`,`Aug`
Unequal-training for deep face recognition with long-tailed noisy data	CVPR	2019	`RL`	Official
Large-scale long-tailed recognition in an open world	CVPR	2019	`RL`	Official

2018

Title	Venue	Year	Type	Code
Large scale fine-grained categorization and domain-specific transfer learning	CVPR	2018	`TL`	Official

2017

Title	Venue	Year	Type
Learning to model the tail	NeurIPS	2017	`CSL`
Focal loss for dense object detection	ICCV	2017	`CSL`
Range loss for deep face recognition with long-tailed training data	ICCV	2017	`RL`
Class rectification hard mining for imbalanced deep learning	ICCV	2017	`RL`

2016

Title	Venue	Year	Type	Code
Learning deep representation for imbalanced classification	CVPR	2016	`Sampling`,`RL`
Factors in finetuning deep model for object detection with long-tail distribution	CVPR	2016	`CSL`,`RL`

3. Benchmark Datasets

Dataset	Long-tailed Task	# Class	# Training data	# Test data
ImageNet-LT	Classification	1,000	115,846	50,000
CIFAR100-LT	Classification	100	50,000	10,000
Places-LT	Classification	365	62,500	36,500
iNaturalist 2018	Classification	8,142	437,513	24,426
LVIS v0.5	Detection and Segmentation	1,230	57,000	20,000
LVIS v1	Detection and Segmentation	1,203	100,000	19,800
VOC-LT	Multi-label Classification	20	1,142	4,952
COCO-LT	Multi-label Classification	80	1,909	5,000
VideoLT	Video Classification	1,004	179,352	25,622

4. Empirical Studies

(1) Long-tailed benchmarking performance

We evaluate several state-of-the-art methods on ImageNet-LT to see to what extent they handle class imbalance via new evaluation metrics, i.e., UA (upper bound accuracy) and RA (relative accuracy). We categorize these methods based on class re-balancing (CR), information augmentation (IA) and module improvement (MI).

Almost all long-tailed methods perform better than the Softmax baseline in terms of accuracy, which demonstrates the effectiveness of long-tailed learning.
Training with 200 epochs leads to better performance for most long-tailed methods, since sufficient training enables deep models to fit data better and learn better image representations.
In addition to accuracy, we also evaluate long-tailed methods based on UA and RA. For the methods that have higher UA, the performance gain comes not only from the alleviation of class imbalance, but also from other factors, like data augmentation or better network architectures. Therefore, simply using accuracy for evaluation is not accurate enough, while our proposed RA metric provides a good complement, since it alleviates the influences of factors apart from class imbalance.
For example, MiSLAS, based on data mixup, has higher accuracy than Balanced Sofmtax under 90 training epochs, but it also has higher UA. As a result, the relative accuracy of MiSLAS is lower than Balanced Sofmtax, which means that Balanced Sofmtax alleviates class imbalance better than MiSLAS under 90 training epochs.
Although some recent high-accuracy methods have lower RA, the overall development trend of long-tailed learning is still positive, as shown in the below figure.

The current state-of-the-art long-tailed method in terms of both accuracy and RA is TADE (ensemble-based method).

(2) More discussions on cost-sensitive losses

We further evaluate the performance of different cost-sensitive learning losses based on the decoupled training scheme.
Decoupled training, compared to joint training, can further improve the overall performance of most cost-sensitive learning methods apart from balanced softmax (BS).
Although BS outperofmrs other cost-sensitive losses under one-stage training, they perform comparably under decoupled training. This implies that although these cost-sensitive losses perform differently under joint training, they essentially learn similar quality of feature representations.

5. Citation

If this repository is helpful to you, please cite our survey.

@article{zhang2021deep,
  title={Deep long-tailed learning: A survey},
  author={Zhang, Yifan and Kang, Bingyi and Hooi, Bryan and Yan, Shuicheng and Feng, Jiashi},
  journal={arXiv preprint arXiv:2110.04596},
  year={2021}
}

A curated list of awesome deep long-tailed learning resources.

Related tags

Overview

Awesome Long-Tailed Learning

1. Type of Long-tailed Learning

2. Top-tier Conference Papers

2021

2020

2019

2018

2017

2016

3. Benchmark Datasets

4. Empirical Studies

(1) Long-tailed benchmarking performance

(2) More discussions on cost-sensitive losses

5. Citation

5. Other Resources

Owner

vanint

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

An open source Python package for plasma science that is under development

A ssl analyzer which could analyzer target domain's certificate.

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

[ICLR'21] Counterfactual Generative Networks

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Scalable machine learning based time series forecasting

In Search of Probeable Generalization Measures

Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Plotting points that lie on the intersection of the given curves using gradient descent.

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

DeepStochlog Package For Python

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tensorflow Tutorials using Jupyter Notebook

Keras implementation of Deeplab v3+ with pretrained weights

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

A curated list of awesome deep long-tailed learning resources.

Related tags

Overview

Awesome Long-Tailed Learning

1. Type of Long-tailed Learning

2. Top-tier Conference Papers

2021

2020

2019

2018

2017

2016

3. Benchmark Datasets

4. Empirical Studies

(1) Long-tailed benchmarking performance

(2) More discussions on cost-sensitive losses

5. Citation

5. Other Resources

Owner

vanint

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

An open source Python package for plasma science that is under development

A ssl analyzer which could analyzer target domain's certificate.

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

[ICLR'21] Counterfactual Generative Networks

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Scalable machine learning based time series forecasting

In Search of Probeable Generalization Measures

Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Plotting points that lie on the intersection of the given curves using gradient descent.

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

DeepStochlog Package For Python

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tensorflow Tutorials using Jupyter Notebook

Keras implementation of Deeplab v3+ with pretrained weights

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch