Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal

Learned Initializations for Optimizing Coordinate-Based Neural Representations

Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Natural Intelligence is still a pretty good idea.

Code and results accompanying our paper titled Mixture Proportion Estimation and PU Learning: A Modern Approach at Neurips 2021 (Spotlight)

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Re-implementation of the vector capsule with dynamic routing

Portfolio asset allocation strategies: from Markowitz to RNNs

A set of tools for converting a darknet dataset to COCO format working with YOLOX

Code I use to automatically update my videos' metadata on YouTube

Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

I-BERT: Integer-only BERT Quantization

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Python SDK for building, training, and deploying ML models

Deep Learning as a Cloud API Service.

Machine learning algorithms for many-body quantum systems

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more