MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

Overview

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

Introduction

The 3D LiDAR place recognition aims to estimate a coarse localization in a previously seen environment based on a single scan from a rotating 3D LiDAR sensor. The existing solutions to this problem include hand-crafted point cloud descriptors (e.g., ScanContext, M2DP, LiDAR IRIS) and deep learning-based solutions (e.g., PointNetVLAD, PCAN, LPD-Net, DAGC, MinkLoc3D), which are often only evaluated on accumulated 2D scans from the Oxford RobotCat dataset. We introduce MinkLoc3D-SI, a sparse convolution-based solution that utilizes spherical coordinates of 3D points and processes the intensity of the 3D LiDAR measurements, improving the performance when a single 3D LiDAR scan is used. Our method integrates the improvements typical for hand-crafted descriptors (like ScanContext) with the most efficient 3D sparse convolutions (MinkLoc3D). Our experiments show improved results on single scans from 3D LiDARs (USyd Campus dataset) and great generalization ability (KITTI dataset). Using intensity information on accumulated 2D scans (RobotCar Intensity dataset) improves the performance, even though spherical representation doesn’t produce a noticeable improvement. As a result, MinkLoc3D-SI is suited for single scans obtained from a 3D LiDAR, making it applicable in autonomous vehicles.

Fig1

Citation

Paper details will be uploaded after acceptance. This work is an extension of Jacek Komorowski's MinkLoc3D.

Environment and Dependencies

Code was tested using Python 3.8 with PyTorch 1.7 and MinkowskiEngine 0.5.0 on Ubuntu 18.04 with CUDA 11.0.

The following Python packages are required:

  • PyTorch (version 1.7)
  • MinkowskiEngine (version 0.5.0)
  • pytorch_metric_learning (version 0.9.94 or above)
  • numba
  • tensorboard
  • pandas
  • psutil
  • bitarray

Modify the PYTHONPATH environment variable to include absolute path to the project root folder:

export PYTHONPATH=$PYTHONPATH:/.../.../MinkLoc3D-SI

Datasets

Preprocessed University of Sydney Campus dataset (USyd) and Oxford RobotCar dataset with intensity channel (IntensityOxford) available here. Extract the dataset folders on the same directory as the project code, so that you have three folders there: 1) IntensityOxford/ 2) MinkLoc3D-SI/ and 3) USyd/.

The pickle files used for positive/negative examples assignment are compatible with the ones introduced in PointNetVLAD and can be generated using the scripts in generating_queries/ folder. The benchmark datasets (Oxford and In-house) introduced in PointNetVLAD can also be used following the instructions in PointNetVLAD.

Before the network training or evaluation, run the below code to generate pickles with positive and negative point clouds for each anchor point cloud.

cd generating_queries/ 

# Generate training tuples for the USyd Dataset
python generate_training_tuples_usyd.py

# Generate evaluation tuples for the USyd Dataset
python generate_test_sets_usyd.py

# Generate training tuples for the IntensityOxford Dataset
python generate_training_tuples_intensityOxford.py

# Generate evaluation tuples for the IntensityOxford Dataset
python generate_test_sets_intensityOxford.py

Training

To train MinkLoc3D-SI network, prepare the data as described above. Edit the configuration file (config/config_usyd.txt or config/config_intensityOxford.txt):

  • num_points - number of points in the point cloud. Points are randomly subsampled or zero-padding is applied during loading, if there number of points is too big/small
  • max_distance - maximum used distance from the sensor, points further than max_distance are removed
  • dataset_name - USyd / IntensityOxford / Oxford
  • dataset_folder - path to the dataset folder
  • batch_size_limit parameter depending on available GPU memory. In our experiments with 10GB of GPU RAM in the case of USyd (23k points) the limit was set to 84, for IntensityOxford (4096 points) the limit was 256.

Edit the model configuration file (models/minkloc_config.txt):

  • version - MinkLoc3D / MinkLoc3D-I / MinkLoc3D-S / MinkLoc3D-SI
  • mink_quantization_size - desired quantization (IntensityOxford and Oxford coordinates are normalized [-1, 1], so the quantization parameters need to be adjusted accordingly!):
    • MinkLoc3D/3D-I: qx,qy,qz units: [m, m, m]
    • MinkLoc3D-S/3D-SI qr,qtheta,qphi units: [m, deg, deg]

To train the network, run:

cd training

# To train the desired model on the USyd Dataset
python train.py --config ../config/config_usyd.txt --model_config ../models/minkloc_config.txt

Evaluation

Pre-trained MinkLoc3D-SI trained on USyd is available in the weights folder. To evaluate run the following command:

cd eval

# To evaluate the model trained on the USyd Dataset
python evaluate.py --config ../config/config_usyd.txt --model_config ../models/minkloc_config.txt --weights ../weights/MinkLoc3D-SI-USyd.pth

License

Our code is released under the MIT License (see LICENSE file for details).

References

  1. J. Komorowski, "MinkLoc3D: Point Cloud Based Large-Scale Place Recognition", Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2021)
  2. M. A. Uy and G. H. Lee, "PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
Semi-supervised Implicit Scene Completion from Sparse LiDAR

Semi-supervised Implicit Scene Completion from Sparse LiDAR Paper Created by Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou and YA-QIN ZH

114 Nov 30, 2022
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
The ARCA23K baseline system

ARCA23K Baseline System This is the source code for the baseline system associated with the ARCA23K dataset. Details about ARCA23K and the baseline sy

4 Jul 02, 2022
PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our paper

Flow Gaussian Mixture Model (FlowGMM) This repository contains a PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our pa

Pavel Izmailov 124 Nov 06, 2022
A small tool to joint picture including gif

README 做设计的时候遇到拼接长图的情况,但是发现没有什么好用的能拼接gif的工具。 于是自己写了个gif拼接小工具。 可以自动拼接gif、png和jpg等常见格式。 效果 从上至下 从下至上 从左至右 从右至左 使用 克隆仓库 git clone https://github.com/Dels

3 Dec 15, 2021
darija <-> english dictionary

darija-dictionary Having advanced IT solutions that are well adapted to the Moroccan context passes inevitably through understanding Moroccan dialect.

DODa 102 Jan 01, 2023
Code accompanying "Adaptive Methods for Aggregated Domain Generalization"

Adaptive Methods for Aggregated Domain Generalization (AdaClust) Official Pytorch Implementation of Adaptive Methods for Aggregated Domain Generalizat

Xavier Thomas 15 Sep 20, 2022
Kohei's 5th place solution for xview3 challenge

xview3-kohei-solution Usage This repository assumes that the given data set is stored in the following locations: $ ls data/input/xview3/*.csv data/in

Kohei Ozaki 2 Jan 17, 2022
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of Tübingen) 195 Dec 29, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar: Parallel Training of Large Language Models via a Chunk-based Memory Management Meeting PatrickStar Pre-Trained Models (PTM) are becoming

Tencent 633 Dec 28, 2022
This is my research project for the Irving Center for Cancer Dynamics/Azizi Lab, Columbia University.

bayesian_uncertainty This is my research project for the Irving Center for Cancer Dynamics/Azizi Lab, Columbia University. In this project I build a s

Max David Gupta 1 Feb 13, 2022
A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

CCasGNN A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics,

5 Apr 29, 2022
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 08, 2022
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
Github for the conference paper GLOD-Gaussian Likelihood OOD detector

FOOD - Fast OOD Detector Pytorch implamentation of the confernce peper FOOD arxiv link. Abstract Deep neural networks (DNNs) perform well at classifyi

17 Jun 19, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 03, 2022
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023