Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Last update: Jan 05, 2023

Related tags

Overview

Back to the Feature with PixLoc

We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via direct feature alignment. It is presented in our paper:

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
to appear at CVPR 2021
Authors: Paul-Edouard Sarlin*, Ajaykumar Unagar*, Måns Larsson, Hugo Germain, Carl Toft, Victor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, and Torsten Sattler

This repository will host the training and inference code. Please subscribe to this issue if you wish to be notified of the code release.

Abstract

Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead.

BibTex Citation

Please consider citing our work if you use any of the ideas presented the paper or code from this repo:

@inproceedings{sarlin21pixloc,
  author    = {Paul-Edouard Sarlin and
               Ajaykumar Unagar and
               Måns Larsson and
               Hugo Germain and
               Carl Toft and
               Victor Larsson and
               Marc Pollefeys and
               Vincent Lepetit and
               Lars Hammarstrand and
               Fredrik Kahl and
               Torsten Sattler},
  title     = {{Back to the Feature}: Learning Robust Camera Localization from Pixels to Pose},
  booktitle = {CVPR},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.09213}
}

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Related tags

Overview

Back to the Feature with PixLoc

Abstract

BibTex Citation

Owner

Computer Vision and Geometry Lab

This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text"

BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

The Codebase for Causal Distillation for Language Models.

Active and Sample-Efficient Model Evaluation

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Software Platform for solving and manipulating multiparametric programs in Python

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

GeDML is an easy-to-use generalized deep metric learning library

Libtorch yolov3 deepsort

This repository contains the code for the paper ``Identifiable VAEs via Sparse Decoding''.

Object detection, 3D detection, and pose estimation using center point detection:

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Collection of NLP model explanations and accompanying analysis tools

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

An implementation of an abstract algebra for music tones (pitches).