Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

PyTorch implementation of federated learning framework based on the acceleration of global momentum

Baselines for TrajNet++

Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative Adversarial Neural Networks

[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Tensorflow implementation for Self-supervised Graph Learning for Recommendation

Pseudo-rng-app - whos needs science to make a random number when you have pseudoscience?

Multi-Horizon-Forecasting-for-Limit-Order-Books

Code for the paper "Multi-task problems are not multi-objective"

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

learning and feeling SLAM together with hands-on-experiments

A Flexible Generative Framework for Graph-based Semi-supervised Learning (NeurIPS 2019)

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

A hifiasm fork for metagenome assembly using Hifi reads.

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

Ensembling Off-the-shelf Models for GAN Training

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"