Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview

Overview

This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann. CVPR 2020.

Most recent 6D pose estimation frameworks first rely on a deep network to establish correspondences between 3D object keypoints and 2D image locations and then use a variant of a RANSAC-based Perspective-n-Point (PnP) algorithm. This two-stage process, however, is suboptimal: First, it is not end-to-end trainable. Second, training the deep network relies on a surrogate loss that does not directly reflect the final 6D pose estimation task.

In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences. It takes as input a group of candidate correspondences for each 3D keypoint and accounts for the fact that the order of the correspondences within each group is irrelevant, while the order of the groups, that is, of the 3D keypoints, is fixed. Our architecture is generic and can thus be exploited in conjunction with existing correspondence-extraction networks so as to yield single-stage 6D pose estimation frameworks. Our experiments demonstrate that these single-stage frameworks consistently outperform their two-stage counterparts in terms of both accuracy and speed.

Figure 1: Motivation. Consider the modern 6D pose estimation algorithm of SegDrivenPose that uses a deep network to predict several 2D correspondences for each of the eight 3D corners of the pitcher's bounding box. (a) Because it minimizes the average 2D error of these correspondences, two instances of such a framework could produce correspondences that differ but have the same average accuracy, such as the green and the red ones. As evidenced by the projected green and red reference frames, applying a RANSAC-based PnP algorithm to these two sets of correspondences can yield substantially different poses. (b) Even when using only the set of green correspondences, simply changing their order causes a RANSAC-based PnP algorithm to return different solutions.

Figure 2: Overall architecture for single-stage 6D object pose estimation. After establishing 3D-to-2D correspondences by some segmentation-driven CNN for 6D pose, we use three main modules to infer the pose from these correspondence clusters directly: a local feature extraction module with shared network parameters, a feature aggregation module operating within the different clusters, and a global inference module consisting of simple fully-connected layers to estimate the final pose as a quaternion and a translation. The color in the CNN outputs indicates the direction of the 2D offset from the grid cell center to the corresponding projected 3D bounding box corner.

How to Use

This repository contains the code for the core network inferring pose from correspondences. It is straightforward to merge with other correspondence-extraction networks SegDrivenPose or PVNet to obtain an end-to-end 6D pose framework.

Citing

@inproceedings{hu2020singlestagepose,
  title={Single-Stage 6D Object Pose Estimation},
  author={Yinlin Hu and Pascal Fua and Wei Wang and Mathieu Salzmann},
  booktitle={CVPR},
  year={2020}
}
Owner
CVLAB @ EPFL
CVLAB @ EPFL
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Jurijs Nazarovs 7 Nov 26, 2022
Anomaly detection related books, papers, videos, and toolboxes

Anomaly Detection Learning Resources Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify

Yue Zhao 6.7k Dec 31, 2022
NLU Dataset Diagnostics

NLU Dataset Diagnostics This repository contains data and scripts to reproduce the results from our paper: Aarne Talman, Marianna Apidianaki, Stergios

Language Technology at the University of Helsinki 1 Jul 20, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Nabil Ibtehaz 308 Jan 05, 2023
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

rethink-audio-fsl This repo contains the source code for the paper "Who calls the shots? Rethinking Few-Shot Learning for Audio." (WASPAA 2021) Table

Yu Wang 34 Dec 24, 2022
Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset This repository provides a unified online platform, LoLi-P

Chongyi Li 457 Jan 03, 2023
Label Hallucination for Few-Shot Classification

Label Hallucination for Few-Shot Classification This repo covers the implementation of the following paper: Label Hallucination for Few-Shot Classific

Yiren Jian 13 Nov 13, 2022
This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Permutohedral_attention_module This repository contains an implementation of the Permutohedral Attention Module

Samuel JOUTARD 26 Nov 27, 2022
Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Variance-Aware-MT-Test-Sets Variance-Aware Machine Translation Test Sets License See LICENSE. We follow the data licensing plan as the same as the WMT

NLP2CT Lab, University of Macau 5 Dec 21, 2021
A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

Tarin Ziyaee 92 Sep 28, 2021
House3D: A Rich and Realistic 3D Environment

House3D: A Rich and Realistic 3D Environment Yi Wu, Yuxin Wu, Georgia Gkioxari and Yuandong Tian House3D is a virtual 3D environment which consists of

Meta Research 1.1k Dec 14, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling Code for the paper: Greg Ver Steeg and Aram Galstyan. "Hamiltonian Dynamics with N

Greg Ver Steeg 25 Mar 14, 2022
This folder contains the implementation of the multi-relational attribute propagation algorithm.

MrAP This folder contains the implementation of the multi-relational attribute propagation algorithm. It requires the package pytorch-scatter. Please

6 Dec 06, 2022
Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

Approximate Outer Product Gradient Descent with Memory Code for the numerical experiment of the paper Speeding-Up Back-Propagation in DNN: Approximate

2 Mar 02, 2022
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Dec 29, 2022
Exe-to-xlsm - Simple script to create VBscript of exe and inject to xlsm

🎁 Exe To Office Executable file injection to Office documents: .xlsm, .docm, .p

3 Jan 25, 2022
Causal estimators for use with WhyNot

WhyNot Estimators A collection of causal inference estimators implemented in Python and R to pair with the Python causal inference library whynot. For

ZYKLS 8 Apr 06, 2022
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022