Localization Distillation for Object Detection

This repo is based on mmDetection.

This is the code for our paper:

Localization Distillation for Object Detection)

LD is the extension of knowledge distillation on localization task, which utilizes the learned bbox distributions to transfer the localization dark knowledge from teacher to student.

LD stably improves over GFocalV1 about ~0.8 AP and ~1 AR100 without adding any computational cost!

Introduction

Knowledge distillation (KD) has witnessed its powerful ability in learning compact models in deep learning field, but it is still limited in distilling localization information for object detection. Existing KD methods for object detection mainly focus on mimicking deep features between teacher model and student model, which not only is restricted by specific model architectures, but also cannot distill localization ambiguity. In this paper, we first propose localization distillation (LD) for object detection. In particular, our LD can be formulated as standard KD by adopting the general localization representation of bounding box. Our LD is very flexible, and is applicable to distill localization ambiguity for arbitrary architecture of teacher model and student model. Moreover, it is interesting to find that Self-LD, i.e., distilling teacher model itself, can further boost state-of-the-art performance. Second, we suggest a teacher assistant (TA) strategy to fill the possible gap between teacher model and student model, by which the distillation effectiveness can be guaranteed even the selected teacher model is not optimal. On benchmark datasets PASCAL VOC and MS COCO, our LD can consistently improve the performance for student detectors, and also boosts state-of-the-art detectors notably.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py 8

Learning rate setting

lr=(samples_per_gpu * num_gpu) / 16 * 0.01

For 2 GPUs and mini-batch size 6, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.00375, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=3,

For 8 GPUs and mini-batch size 16, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=2,

Convert model

After training with LD, the weight file .pth will be large. You'd better convert the model to save a new small one. See convert_model.py#L38-L40, you can set them to your .pth file and config file. Then, run

python convert_model.py

Speed Test (FPS)

CUDA_VISIBLE_DEVICES=0 python3 ./tools/benchmark.py configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth

COCO Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth 8 --eval bbox

GFocalV1 with LD

Teacher	Student	Training schedule	Mini-batch size	AP (val)	AP50 (val)	AP75 (val)	AP (test-dev)	AP50 (test-dev)	AP75 (test-dev)	AR100 (test-dev)
--	R-18	1x	6	35.8	53.1	38.2	36.0	53.4	38.7	55.3
R-101	R-18	1x	6	36.5	52.9	39.3	36.8	53.5	39.9	56.6
--	R-34	1x	6	38.9	56.6	42.2	39.2	56.9	42.3	58.0
R-101	R-34	1x	6	39.8	56.6	43.1	40.0	57.1	43.5	59.3
--	R-50	1x	6	40.1	58.2	43.1	40.5	58.8	43.9	59.0
R-101	R-50	1x	6	41.1	58.7	44.9	41.2	58.8	44.7	59.8
--	R-101	2x	6	44.6	62.9	48.4	45.0	63.6	48.9	62.3
R-101-DCN	R-101	2x	6	45.4	63.1	49.5	45.6	63.7	49.8	63.3

GFocalV1 with Self-LD

Teacher	Student	Training schedule	Mini-batch size	AP (val)	AP50 (val)	AP75 (val)
--	R-18	1x	6	35.8	53.1	38.2
R-18	R-18	1x	6	36.1	52.9	38.5
--	R-50	1x	6	40.1	58.2	43.1
R-50	R-50	1x	6	40.6	58.2	43.8
--	X-101-32x4d-DCN	1x	4	46.9	65.4	51.1
X-101-32x4d-DCN	X-101-32x4d-DCN	1x	4	47.5	65.8	51.8

GFocalV2 with LD

Teacher	Student	Training schedule	Mini-batch size	AP (test-dev)	AP50 (test-dev)	AP75 (test-dev)	AR100 (test-dev)
--	R-50	2x	16	44.4	62.3	48.5	62.4
R-101	R-50	2x	16	44.8	62.4	49.0	63.1
--	R-101	2x	16	46.0	64.1	50.2	63.5
R-101-DCN	R-101	2x	16	46.8	64.5	51.1	64.3
--	R-101-DCN	2x	16	48.2	66.6	52.6	64.4
R2-101-DCN	R-101-DCN	2x	16	49.1	67.1	53.7	65.6
--	X-101-32x4d-DCN	2x	16	49.0	67.6	53.4	64.7
R2-101-DCN	X-101-32x4d-DCN	2x	16	50.2	68.3	54.9	66.3
--	R2-101-DCN	2x	16	50.5	68.9	55.1	66.2
R2-101-DCN	R2-101-DCN	2x	16	51.0	69.1	55.9	66.8

VOC Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r18_fpn_voc.py work_dirs/ld_gflv1_r101_r18_fpn_voc/epoch_4.pth 8 --eval mAP

GFocalV1 with LD

Teacher	Student	Training Epochs	Mini-batch size	AP	AP50	AP75
--	R-18	4	6	51.8	75.8	56.3
R-101	R-18	4	6	53.0	75.9	57.6
--	R-50	4	6	55.8	79.0	60.7
R-101	R-50	4	6	56.1	78.5	61.2
--	R-34	4	6	55.7	78.9	60.6
R-101-DCN	R-34	4	6	56.7	78.4	62.1
--	R-101	4	6	57.6	80.4	62.7
R-101-DCN	R-101	4	6	58.4	80.2	63.7

This is an example of evaluation results (R-101→R-18).

+-------------+------+-------+--------+-------+
| class       | gts  | dets  | recall | ap    |
+-------------+------+-------+--------+-------+
| aeroplane   | 285  | 4154  | 0.081  | 0.030 |
| bicycle     | 337  | 7124  | 0.125  | 0.108 |
| bird        | 459  | 5326  | 0.096  | 0.018 |
| boat        | 263  | 8307  | 0.065  | 0.034 |
| bottle      | 469  | 10203 | 0.051  | 0.045 |
| bus         | 213  | 4098  | 0.315  | 0.247 |
| car         | 1201 | 16563 | 0.193  | 0.131 |
| cat         | 358  | 4878  | 0.254  | 0.128 |
| chair       | 756  | 32655 | 0.053  | 0.027 |
| cow         | 244  | 4576  | 0.131  | 0.109 |
| diningtable | 206  | 13542 | 0.150  | 0.117 |
| dog         | 489  | 6446  | 0.196  | 0.076 |
| horse       | 348  | 5855  | 0.144  | 0.036 |
| motorbike   | 325  | 6733  | 0.052  | 0.017 |
| person      | 4528 | 51959 | 0.099  | 0.037 |
| pottedplant | 480  | 12979 | 0.031  | 0.009 |
| sheep       | 242  | 4706  | 0.132  | 0.060 |
| sofa        | 239  | 9640  | 0.192  | 0.060 |
| train       | 282  | 4986  | 0.142  | 0.042 |
| tvmonitor   | 308  | 7922  | 0.078  | 0.045 |
+-------------+------+-------+--------+-------+
| mAP         |      |       |        | 0.069 |
+-------------+------+-------+--------+-------+
AP:  0.530091167986393
['AP50: 0.759393', 'AP55: 0.744544', 'AP60: 0.724239', 'AP65: 0.693551', 'AP70: 0.639848', 'AP75: 0.576284', 'AP80: 0.489098', 'AP85: 0.378586', 'AP90: 0.226534', 'AP95: 0.068834']
{'mAP': 0.7593928575515747}

Note:

For more experimental details, please refer to GFocalV1, GFocalV2 and mmdetection.
According to ATSS, there is no gap between box-based regression and point-based regression. Personal conjectures: 1) If xywh form is able to work when using general distribution (apply uniform subinterval division for xywh), our LD can also work in xywh form. 2) If xywh form with general distribution cannot obtain better result, then the best modification is to firstly switch xywh form to tblr form and then apply general distribution and LD. Consequently, whether xywh form + general distribution works or not, our LD benefits for all the regression-based detector.

Pretrained weights

VOC	COCO
GFocalV1 teacher R101 pan.baidu pw: ufc8	GFocalV1 + LD R101_R18_1x pan.baidu pw: hj8d
GFocalV1 teacher R101DCN pan.baidu pw: 5qra	GFocalV1 + LD R101_R50_1x pan.baidu pw: bvzz
GFocalV1 + LD R101_R18 pan.baidu pw: 1bd3	GFocalV2 + LD R101_R50_2x pan.baidu pw: 3jtq
GFocalV1 + LD R101DCN_R34 pan.baidu pw: thuw	GFocalV2 + LD R101DCN_R101_2x pan.baidu pw: zezq
GFocalV1 + LD R101DCN_R101 pan.baidu pw: mp8t	GFocalV2 + LD R2N_R101DCN_2x pan.baidu pw: fsbm
	GFocalV2 + LD R2N_X101_2x pan.baidu pw: 9vcc
	GFocalV2 + Self-LD R2N_R2N_2x pan.baidu pw: 9azn

For any other teacher model, you can download at GFocalV1, GFocalV2 and mmdetection.

Score voting Cluster-DIoU-NMS

We provide Score voting Cluster-DIoU-NMS which is a speed up version of score voting NMS and combination with DIoU-NMS. For GFocalV1 and GFocalV2, Score voting Cluster-DIoU-NMS will bring 0.1-0.3 AP increase, 0.2-0.5 AP75 increase, <=0.4 AP50 decrease and <=1.5 FPS decrease, while it is much faster than score voting NMS in mmdetection. The relevant portion of the config file would be:

# Score voting Cluster-DIoU-NMS
test_cfg = dict(
nms=dict(type='voting_cluster_diounms', iou_threshold=0.6),

# Original NMS
test_cfg = dict(
nms=dict(type='nms', iou_threshold=0.6),

Citation

If you find LD useful in your research, please consider citing:

@Article{zheng2021LD,
  title={Localization Distillation for Object Detection},
  author= {Zhaohui Zheng, Rongguang Ye, Ping Wang, Jun Wang, Dongwei Ren, Wangmeng Zuo},
  journal={arXiv:2102.12252},
  year={2021}
}

Localization Distillation for Object Detection

Related tags

Overview

Localization Distillation for Object Detection

This repo is based on mmDetection.

Introduction

Installation

Get Started

Train

Learning rate setting

Convert model

Speed Test (FPS)

COCO Evaluation

GFocalV1 with LD

GFocalV1 with Self-LD

GFocalV2 with LD

VOC Evaluation

GFocalV1 with LD

Note:

Pretrained weights

For any other teacher model, you can download at GFocalV1, GFocalV2 and mmdetection.

Score voting Cluster-DIoU-NMS

Citation

Owner

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Aws-machine-learning-university-accelerated-tab - Machine Learning University: Accelerated Tabular Data Class

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Code and data for the paper "Hearing What You Cannot See"

Repository of continual learning papers

A Pytorch implement of paper "Anomaly detection in dynamic graphs via transformer" (TADDY).

Split Variational AutoEncoder

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

Powerful and efficient Computer Vision Annotation Tool (CVAT)

Deeplab-resnet-101 in Pytorch with Jaccard loss

SARS-Cov-2 Recombinant Finder for fasta sequences

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.