CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Last update: Dec 26, 2022

Overview

Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View,
Tianfei Zhou, Wenguan Wang, Ender Konukoglu and Luc Van Gool
CVPR 2022 (Oral) (arXiv 2203.15102)

News

[2022-04-19] Release the code based on openseg.pytorch!
[2022-03-31] Paper link updated!
[2022-03-12] Repo created. Paper and code will come soon.

Abstract

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters.We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

Installation

This implementation is built on openseg.pytorch. Many thanks to the authors for the efforts.

Please follow the Getting Started for installation and dataset preparation.

Performance

Cityscapes

Method	Train Set	Val Set	Iters	Batch Size	mIoU	Log	CKPT	Script
HRNet	train	val	80K	8	79.0	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4.sh`
Ours	train	val	80K	8	80.1	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh`

More results will come soon

Citation

@inproceedings{zhou2022rethinking,
    author    = {Zhou, Tianfei and Wang, Wenguan and Konukoglu, Ender and Van Gool, Luc},
    title     = {Rethinking Semantic Segmentation: A Prototype View},
    booktitle = {CVPR},
    year      = {2022}
}

Relevant Projects

Please also see our works [1] for a novel training paradigm with a cross-image, pixel-to-pixel contrative loss, and [2] for a novel hierarchy-aware segmentation learning scheme for structured scene parsing.

[1] Exploring Cross-Image Pixel Contrast for Semantic Segmentation - ICCV 2021 (Oral) [arXiv][code]

[2] Deep Hierarchical Semantic Segmentation - CVPR 2022 [arXiv][code]

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Related tags

Overview

Rethinking Semantic Segmentation: A Prototype View

News

Abstract

Installation

Performance

Cityscapes

Citation

Relevant Projects

Owner

Tianfei Zhou

Remote sensing change detection tool based on PaddlePaddle

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

Contrastive Learning Inverts the Data Generating Process

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Simultaneous Demand Prediction and Planning

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Natural Intelligence is still a pretty good idea.

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

A collection of semantic image segmentation models implemented in TensorFlow

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

Repository for the Bias Benchmark for QA dataset.

PiRank: Learning to Rank via Differentiable Sorting

Progressive Coordinate Transforms for Monocular 3D Object Detection

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

A motion tracking system for any arbitaray points in a video frame.

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.