RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts.

by Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu and Stella X. Yu at UC Berkeley/ICSI and NTU

International Conference on Learning Representations (ICLR), 2021. Spotlight Presentation

This repository contains an official re-implementation of RIDE from the authors, while also has plans to support other works on long-tailed recognition. Further information please contact Xudong Wang and Long Lian.

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{wang2021longtailed,
  title={Long-tailed Recognition by Routing Diverse Distribution-Aware Experts},
  author={Xudong Wang and Long Lian and Zhongqi Miao and Ziwei Liu and Stella Yu},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=D9I3drBz4UC}
}

Supported Methods for Long-tailed Recognition:

Updates

[04/2021] Pre-trained models are avaliable in model zoo.

[12/2020] We added an approximate GFLops counter. See usages below. We also refactored the code and fixed a few errors.

[12/2020] We have limited support on cRT and tau-norm in load_stage1 option and t-normalization.py, please look at the code comments for instructions while we are still working on it.

[12/2020] Initial Commit. We re-implemented RIDE in this repo. LDAM/Focal/Cross-Entropy loss is also re-implemented (instruction below).

RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

Requirements

Packages

Python >= 3.7, < 3.9
PyTorch >= 1.6
tqdm (Used in test.py)
tensorboard >= 1.14 (for visualization)
pandas
numpy

Hardware requirements

8 GPUs with >= 11G GPU RAM are recommended. Otherwise the model with more experts may not fit in, especially on datasets with more classes (the FC layers will be large). We do not support CPU training, but CPU inference could be supported by slight modification.

Dataset Preparation

CIFAR code will download data automatically with the dataloader. We use data the same way as classifier-balancing. For ImageNet-LT and iNaturalist, please prepare data in the data directory. ImageNet-LT can be found at this link. iNaturalist data should be the 2018 version from this repo (Note that it requires you to pay to download now). The annotation can be found at here. Please put them in the same location as below:

data
├── cifar-100-python
│   ├── file.txt~
│   ├── meta
│   ├── test
│   └── train
├── cifar-100-python.tar.gz
├── ImageNet_LT
│   ├── ImageNet_LT_open.txt
│   ├── ImageNet_LT_test.txt
│   ├── ImageNet_LT_train.txt
│   ├── ImageNet_LT_val.txt
│   ├── test
│   ├── train
│   └── val
└── iNaturalist18
    ├── iNaturalist18_train.txt
    ├── iNaturalist18_val.txt
    └── train_val2018

How to get pretrained checkpoints

We have a model zoo available.

Training and Evaluation Instructions

Imbalanced CIFAR 100/CIFAR100-LT

RIDE Without Distill (Stage 1)

python train.py -c "configs/config_imbalance_cifar100_ride.json" --reduce_dimension 1 --num_experts 3

Note: --reduce_dimension 1 means set reduce dimension to True. The template has an issue with bool arguments so int argument is used here. However, any non-zero value will be equivalent to bool True.

RIDE With Distill (Stage 1)

python train.py -c "configs/config_imbalance_cifar100_distill_ride.json" --reduce_dimension 1 --num_experts 3 --distill_checkpoint path_to_checkpoint

Distillation is not required but could be performed if you'd like further improvements.

RIDE Expert Assignment Module Training (Stage 2)

python train.py -c "configs/config_imbalance_cifar100_ride_ea.json" -r path_to_stage1_checkpoint --reduce_dimension 1 --num_experts 3

Note: different runs will result in different EA modules with different trade-off. Some modules give higher accuracy but require higher FLOps. Although the only difference is not underlying ability to classify but the "easiness to satisfy and stop". You can tune the pos_weight if you think the EA module consumes too much compute power or is using too few expert.

ImageNet-LT

RIDE Without Distill (Stage 1)

ResNet 10

python train.py -c "configs/config_imagenet_lt_resnet10_ride.json" --reduce_dimension 1 --num_experts 3

ResNet 50

python train.py -c "configs/config_imagenet_lt_resnet50_ride.json" --reduce_dimension 1 --num_experts 3

ResNeXt 50

python train.py -c "configs/config_imagenet_lt_resnext50_ride.json" --reduce_dimension 1 --num_experts 3

RIDE With Distill (Stage 1)

ResNet 10

python train.py -c "configs/config_imagenet_lt_resnet10_distill_ride.json" --reduce_dimension 1 --num_experts 3 --distill_checkpoint path_to_checkpoint

ResNet 50

python train.py -c "configs/config_imagenet_lt_resnet50_distill_ride.json" --reduce_dimension 1 --num_experts 3 --distill_checkpoint path_to_checkpoint

ResNeXt 50

python train.py -c "configs/config_imagenet_lt_resnext50_distill_ride.json" --reduce_dimension 1 --num_experts 3 --distill_checkpoint path_to_checkpoint

RIDE Expert Assignment Module Training (Stage 2)

ResNet 10

python train.py -c "configs/config_imagenet_lt_resnet10_ride_ea.json" -r path_to_stage1_checkpoint --reduce_dimension 1 --num_experts 3

ResNet 50

python train.py -c "configs/config_imagenet_lt_resnet50_ride_ea.json" -r path_to_stage1_checkpoint --reduce_dimension 1 --num_experts 3

ResNeXt 50

python train.py -c "configs/config_imagenet_lt_resnext50_ride_ea.json" -r path_to_stage1_checkpoint --reduce_dimension 1 --num_experts 3

iNaturalist

RIDE Without Distill (Stage 1)

python train.py -c "configs/config_iNaturalist_resnet50_ride.json" --reduce_dimension 1 --num_experts 3

RIDE With Distill (Stage 1)

python train.py -c "configs/config_iNaturalist_resnet50_distill_ride.json" --reduce_dimension 1 --num_experts 3 --distill_checkpoint path_to_checkpoint

RIDE Expert Assignment Module Training (Stage 2)

python train.py -c "configs/config_iNaturalist_resnet50_ride_ea.json" -r path_to_stage1_checkpoint --reduce_dimension 1 --num_experts 3

Using Other Methods with RIDE

Focal Loss: switch the loss to Focal Loss
Cross Entropy: switch the loss to Cross Entropy Loss

Test

To test a checkpoint, please put it with the corresponding config file.

python test.py -r path_to_checkpoint

Please see the pytorch template that we use for additional more general usages of this project (e.g. loading from a checkpoint, etc.).

GFLops calculation

We provide an experimental support for approximate GFLops calculation. Please open an issue if you encounter any problem or meet inconsistency in GFLops.

You need to install thop package first. Then, according to your model, run python -m utils.gflops (args) in the project directory.

Examples and explanations

Use python -m utils.gflops to see the documents as well as explanations for this calculator.

ImageNet-LT

python -m utils.gflops ResNeXt50Model 0 --num_experts 3 --reduce_dim True --use_norm False

To change model, switch ResNeXt50Model to the ones used in your config. use_norm comes with LDAM-based methods (including RIDE). reduce_dim is used in default RIDE models. The 0 in the command line indicates the dataset.

All supported datasets:

0: ImageNet-LT
1: iNaturalist
2: Imbalance CIFAR 100

iNaturalist

python -m utils.gflops ResNet50Model 1 --num_experts 3 --reduce_dim True --use_norm True

Imbalance CIFAR 100

python -m utils.gflops ResNet32Model 2 --num_experts 3 --reduce_dim True --use_norm True

Special circumstances: calculate the approximate GFLops in models with expert assignment module

We provide a ea_percentage for specifying the percentage of data that pass each expert. Note that you need to switch to the EA model as well since you actually use EA model instead of the original model in training and inference.

An example:

python -m utils.gflops ResNet32EAModel 2 --num_experts 3 --reduce_dim True --use_norm True --ea_percentage 40.99,9.47,49.54

FAQ

See FAQ.

How to get support from us?

If you have any general questions, feel free to email us at longlian at berkeley.edu and xdwang at eecs.berkeley.edu. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

Pytorch template

This is a project based on this pytorch template. The readme of the template explains its functionality, although we try to list most frequently used ones in this readme.

License

This project is licensed under the MIT License. See LICENSE for more details. The parts described below follow their original license.

Acknowledgements

This is a project based on this pytorch template. The pytorch template is inspired by the project Tensorflow-Project-Template by Mahmoud Gemy

The ResNet and ResNeXt in fb_resnets are based on from Classifier-Balancing/Decouple. The ResNet in ldam_drw_resnets/LDAM loss/CIFAR-LT are based on LDAM-DRW. KD implementation takes references from CRD/RepDistiller.

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

Related tags

Overview

RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts.

Citation

Supported Methods for Long-tailed Recognition:

Updates

Table of contents

Requirements

Packages

Hardware requirements

Dataset Preparation

How to get pretrained checkpoints

Training and Evaluation Instructions

Imbalanced CIFAR 100/CIFAR100-LT

RIDE Without Distill (Stage 1)

RIDE With Distill (Stage 1)

RIDE Expert Assignment Module Training (Stage 2)

ImageNet-LT

RIDE Without Distill (Stage 1)

ResNet 10

ResNet 50

ResNeXt 50

RIDE With Distill (Stage 1)

ResNet 10

ResNet 50

ResNeXt 50

RIDE Expert Assignment Module Training (Stage 2)

ResNet 10

ResNet 50

ResNeXt 50

iNaturalist

RIDE Without Distill (Stage 1)

RIDE With Distill (Stage 1)

RIDE Expert Assignment Module Training (Stage 2)

Using Other Methods with RIDE

Test

GFLops calculation

Examples and explanations

ImageNet-LT

iNaturalist

Imbalance CIFAR 100

Special circumstances: calculate the approximate GFLops in models with expert assignment module

FAQ

How to get support from us?

Pytorch template

License

Acknowledgements

Owner

Xudong (Frank) Wang

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Extract Keywords from sentence or Replace keywords in sentences.

An assignment on creating a minimalist neural network toolkit for CS11-747

A list of NLP(Natural Language Processing) tutorials

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Lumped-element impedance calculator and frequency-domain plotter.

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Code examples for my Write Better Python Code series on YouTube.

NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Implementation of Multistream Transformers in Pytorch

Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

A very simple framework for state-of-the-art Natural Language Processing (NLP)

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

Turkish Stop Words Türkçe Dolgu Sözcükleri

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

A fast and lightweight python-based CTC beam search decoder for speech recognition.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。