Location-Sensitive Visual Recognition with Cross-IOU Loss

Overview

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource.

Location-Sensitive Visual Recognition with Cross-IOU Loss

by Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang and Qi Tian

The code to train and evaluate the proposed LSNet is available here. For more technical details, please refer to our arXiv paper.

The location-sensitive visual recognition tasks, including object detection, instance segmentation, and human pose estimation, can be formulated into localizing an anchor point (in red) and a set of landmarks (in green). Our work aims to offer a unified framework for these tasks.

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor-landmark pair to approximate the global IOU between the prediction and groundtruth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition. Evaluated on the MSCOCO dataset, LSNet set the new state-of-the-art accuracy for anchor-free object detection (a 53.5% box AP) and instance segmentation (a 40.2% mask AP), and shows promising performance in detecting multi-scale human poses.

If you encounter any problems in using our code, please contact Kaiwen Duan: [email protected]

Bbox AP(%) on COCO test-dev

Method Backbone epoch MStrain AP AP50 AP75 APS APM APL
Anchor-based:
Libra R-CNN X-101-64x4d 12 N 43.0 64.0 47.0 25.3 45.6 54.6
AB+FSAF* X-101-64x4d 18 Y 44.6 65.2 48.6 29.7 47.1 54.6
FreeAnchor* X-101-32x8d 24 Y 47.3 66.3 51.5 30.6 50.4 59.0
GFLV1* X-101-32x8d 24 Y 48.2 67.4 52.6 29.2 51.7 60.2
ATSS* X-101-64x4d-DCN 24 Y 50.7 68.9 56.3 33.2 52.9 62.4
PAA* X-101-64x4d-DCN 24 Y 51.4 69.7 57.0 34.0 53.8 64.0
GFLV2* R2-101-DCN 24 Y 53.3 70.9 59.2 35.7 56.1 65.6
YOLOv4-P7* CSP-P7 450 Y 56.0 73.3 61.2 38.9 60.0 68.6
Anchor-free:
ExtremeNet* HG-104 200 Y 43.2 59.8 46.4 24.1 46.0 57.1
RepPointsV1* R-101-DCN 24 Y 46.5 67.4 50.9 30.3 49.7 57.1
SAPD X-101-64x4d-DCN 24 Y 47.4 67.4 51.1 28.1 50.3 61.5
CornerNet* HG-104 200 Y 42.1 57.8 45.3 20.8 44.8 56.7
DETR R-101 500 Y 44.9 64.7 47.7 23.7 49.5 62.3
CenterNet* HG-104 190 Y 47.0 64.5 50.7 28.9 49.9 58.9
CPNDet* HG-104 100 Y 49.2 67.4 53.7 31.0 51.9 62.4
BorderDet* X-101-64x4d-DCN 24 Y 50.3 68.9 55.2 32.8 52.8 62.3
FCOS-BiFPN X-101-32x8-DCN 24 Y 50.4 68.9 55.0 33.2 53.0 62.7
RepPointsV2* X-101-64x4d-DCN 24 Y 52.1 70.1 57.5 34.5 54.6 63.6
LSNet R-50 24 Y 44.8 64.1 48.8 26.6 47.7 55.7
LSNet X-101-64x4d 24 Y 48.2 67.6 52.6 29.6 51.3 60.5
LSNet X-101-64x4d-DCN 24 Y 49.6 69.0 54.1 30.3 52.8 62.8
LSNet-CPV X-101-64x4d-DCN 24 Y 50.4 69.4 54.5 31.0 53.3 64.0
LSNet-CPV R2-101-DCN 24 Y 51.1 70.3 55.2 31.2 54.3 65.0
LSNet-CPV* R2-101-DCN 24 Y 53.5 71.1 59.2 35.2 56.4 65.8

A comparison between LSNet and the sate-of-the-art methods in object detection on the MS-COCO test-dev set. LSNet surpasses all competitors in the anchor-free group. The abbreviations are: ‘R’ – ResNet, ‘X’ – ResNeXt, ‘HG’ – Hourglass network, ‘R2’ – Res2Net, ‘CPV’ – corner point verification, ‘MStrain’ – multi-scale training, * – multi-scale testing.

Segm AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APS APM APL
Pixel-based:
YOLACT R-101 48 31.2 50.6 32.8 12.1 33.3 47.1
TensorMask R-101 72 37.1 59.3 39.4 17.1 39.1 51.6
Mask R-CNN X-101-32x4d 12 37.1 60.0 39.4 16.9 39.9 53.5
HTC X-101-64x4d 20 41.2 63.9 44.7 22.8 43.9 54.6
DetectoRS* X-101-64x4d 40 48.5 72.0 53.3 31.6 50.9 61.5
Contour-based:
ExtremeNet HG-104 100 18.9 44.5 13.7 10.4 20.4 28.3
DeepSnake DLA-34 120 30.3 - - - - -
PolarMask X-101-64x4d-DCN 24 36.2 59.4 37.7 17.8 37.7 51.5
LSNet X-101-64x4d-DCN 30 37.6 64.0 38.3 22.1 39.9 49.1
LSNet R2-101-DCN 30 38.0 64.6 39.0 22.4 40.6 49.2
LSNet* X-101-64x4d-DCN 30 39.7 65.5 41.3 25.5 41.3 50.4
LSNet* R2-101-DCN 30 40.2 66.2 42.1 25.8 42.2 51.0

Comparison of LSNet to the sate-of-the-art methods in instance segmentation task on the COCO test-dev set. Our LSNet achieves the state-of-the-art accuracy for contour-based instance segmentation. ‘R’ - ResNet, ‘X’ - ResNeXt, ‘HG’ - Hourglass, ‘R2’ - Res2Net, * - multi-scale testing.

Keypoints AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APM APL
Heatmap-based:
CenterNet-jd DLA-34 320 57.9 84.7 63.1 52.5 67.4
OpenPose VGG-19 - 61.8 84.9 67.5 58.0 70.4
Pose-AE HG 300 62.8 84.6 69.2 57.5 70.6
CenterNet-jd HG104 150 63.0 86.8 69.6 58.9 70.4
Mask R-CNN R-50 28 63.1 87.3 68.7 57.8 71.4
PersonLab R-152 >1000 66.5 85.5 71.3 62.3 70.0
HRNet HRNet-W32 210 74.9 92.5 82.8 71.3 80.9
Regression-based:
CenterNet-reg [66] DLA-34 320 51.7 81.4 55.2 44.6 63.0
CenterNet-reg [66] HG-104 150 55.0 83.5 59.7 49.4 64.0
LSNet w/ obj-box X-101-64x4d-DCN 60 55.7 81.3 61.0 52.9 60.5
LSNet w/ kps-box X-101-64x4d-DCN 20 59.0 83.6 65.2 53.3 67.9

Comparison of LSNet to the sate-of-the-art methods in pose estimation task on the COCO test-dev set. LSNet predict the keypoints by regression. ‘obj-box’ and ‘kps-box’ denote the object bounding boxes and the keypoint-boxes, respectively. For LSNet w/ kps-box, we fine-tune the model from the LSNet w/ kps-box for another 20 epochs.

Visualization

Some location-sensitive visual recognition results on the MS-COCO validation set.

We compared with the CenterNet to show that our LSNet w/ ‘obj-box’ tends to predict more human pose of small scales, which are not annotated on the dataset. Only pose results with scores higher than 0:3 are shown for both methods.

Left: LSNet uses the object bounding boxes to assign training samples. Right: LSNet uses the keypoint-boxes to assign training samples. Although LSNet with keypoint-boxes enjoys higher AP score, its ability of perceiving multi-scale human instances is weakened.

Preparation

The master branch works with PyTorch 1.5.0

The dataset directory should be like this:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── images
            ├── train2017
            ├── val2017
            ├── test2017

Generate extreme point annotation from segmentation:

  • cd code/tools
  • python gen_coco_lsvr.py
  • cd ..

Installation

1. Installing cocoapi
  • cd cocoapi/pycocotools
  • python setup.py develop
  • cd ../..
2. Installing mmcv
  • cd mmcv
  • pip install -e.
  • cd ..
3. Installing mmdet
  • python setup.py develop

Training and Evaluation

Our LSNet is based on mmdetection. Please check with existing dataset for Training and Evaluation.

Owner
Kaiwen Duan
Kaiwen Duan
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
NeuralDiff: Segmenting 3D objects that move in egocentric videos

NeuralDiff: Segmenting 3D objects that move in egocentric videos Project Page | Paper + Supplementary | Video About This repository contains the offic

Vadim Tschernezki 14 Dec 05, 2022
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

AstraZeneca - Molecular AI 72 Jan 02, 2023
Garbage classification using structure data.

垃圾分类模型使用说明 1.包含以下数据文件 文件 描述 data/MaterialMapping.csv 物体以及其归类的信息 data/TestRecords 光谱原始测试数据 CSV 文件 data/TestRecordDesc.zip CSV 文件描述文件 data/Boundaries.cs

wenqi 1 Dec 10, 2021
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks Yonggan Fu, Qixuan Yu, Yang Zhang, S

12 Dec 11, 2022
[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Amplitude-Phase Recombination (ICCV'21) Official PyTorch implementation of "Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neur

Guangyao Chen 53 Oct 05, 2022
Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

CompVis Heidelberg 293 Dec 22, 2022
SMCA replication There are no extra compiled components in SMCA DETR and package dependencies are minimal

Usage There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instruct

22 May 06, 2022
PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration [video] [paper] [supplementary] [data] [thesis] Introduction De

Natalie Lang 10 Dec 14, 2022
Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022
Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu, Qiang

onion 462 Dec 29, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed+Megatron trained the world's most powerful language model: MT-530B DeepSpeed is hiring, come join us! DeepSpeed is a deep learning optimizat

Microsoft 8.4k Dec 28, 2022
Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

HiT-GAN Official TensorFlow Implementation HiT-GAN presents a Transformer-based generator that is trained based on Generative Adversarial Networks (GA

Google Research 78 Oct 31, 2022
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
Image super-resolution through deep learning

srez Image super-resolution through deep learning. This project uses deep learning to upscale 16x16 images by a 4x factor. The resulting 64x64 images

David Garcia 5.3k Dec 28, 2022
A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

What is This This is a simple re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"(1). Only Sections

102 Dec 14, 2022
Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021)

Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021) This repository contains the code for our ICCV2021 paper by Jia-Ren Cha

Jia-Ren Chang 40 Dec 27, 2022
Lightweight plotting to the terminal. 4x resolution via Unicode.

Uniplot Lightweight plotting to the terminal. 4x resolution via Unicode. When working with production data science code it can be handy to have plotti

Olav Stetter 203 Dec 29, 2022
This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems. The main directory include the code

0 Dec 23, 2021