Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Last update: Dec 27, 2022

Related tags

Overview

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Created by Zeyu HU

Introduction

This work is based on our paper VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation, which appears at the IEEE International Conference on Computer Vision (ICCV) 2021.

In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters).

Citation

If you find our work useful in your research, please consider citing:

@misc{hu2021vmnet,
      title={VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation}, 
      author={Zeyu Hu and Xuyang Bai and Jiaxiang Shang and Runze Zhang and Jiayu Dong and Xin Wang and Guangyuan Sun and Hongbo Fu and Chiew-Lan Tai},
      year={2021},
      eprint={2107.13824},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Installation

Our code is based on Pytorch. Please make sure CUDA and cuDNN are installed. One configuration has been tested:
- Python 3.7
- Pytorch 1.4.0
- torchvision 0.5.0
- CUDA 10.0
- cudatoolkit 10.0.130
- cuDNN 7.6.5
VMNet depends on the torch-geometric and torchsparse libraries. Please follow their installation instructions. One configuration has been tested, higher versions should work as well:
- torch-geometric 1.6.3
- torchsparse 1.1.0
We adapted VCGlib to generate pooling trace maps for vertex clustering and quadric error metrics.
```
git clone https://github.com/cnr-isti-vclab/vcglib

# QUADRIC ERROR METRICS
cd vcglib/apps/tridecimator/
qmake
make

# VERTEX CLUSTERING
cd ../sample/trimesh_clustering
qmake
make
```
Please add vcglib/apps/tridecimator and vcglib/apps/sample/trimesh_clustering to your environment path variable.
Other dependencies. One configuration has been tested:
- open3d 0.9.0
- plyfile 0.7.3
- scikit-learn 0.24.0
- scipy 1.6.0

Data Preparation

Please refer to https://github.com/ScanNet/ScanNet and https://github.com/niessner/Matterport to get access to the ScanNet and Matterport dataset. Our method relies on the .ply as well as the .labels.ply files. We take ScanNet dataset as example for the following instructions.
Create directories to store processed data.
- 'path/to/processed_data/train/'
- 'path/to/processed_data/val/'
- 'path/to/processed_data/test/'

Prepare train data.

python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_train.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/train/

Prepare val data.

python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_val.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/val/

Prepare test data.

python prepare_data.py --test_split --considered_rooms_path dataset/data_split/scannetv2_test.txt --in_path path/to/ScanNet/scans_test --out_path path/to/processed_data/test/

Train

On train/val/test setting.

CUDA_VISIBLE_DEVICES=0 python run.py --train --exp_name name_you_want --data_path path/to/processed_data

On train+val/test setting (for ScanNet benchmark).

CUDA_VISIBLE_DEVICES=0 python run.py --train_benchmark --exp_name name_you_want --data_path path/to/processed_data

Inference

Validation. Pretrained model (73.3% mIoU on ScanNet Val). Please download and put into directory check_points/val_split.
```
CUDA_VISIBLE_DEVICES=0 python run.py --val --exp_name val_split --data_path path/to/processed_data
```
Test. Pretrained model (74.6% mIoU on ScanNet Test). Please download and put into directory check_points/test_split. TxT files for benchmark submission will be saved in directory test_results/.
```
CUDA_VISIBLE_DEVICES=0 python run.py --test --exp_name test_split --data_path path/to/processed_data
```

Acknowledgements

Our code is built upon torch-geometric, torchsparse and dcm-net.

License

Our code is released under MIT License (see LICENSE file for details).

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Related tags

Overview

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Introduction

Citation

Installation

Data Preparation

Train

Inference

Acknowledgements

License

Owner

HU Zeyu

Global Rhythm Style Transfer Without Text Transcriptions

Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Efficient 3D human pose estimation in video using 2D keypoint trajectories

🥈78th place in Riiid Solution🥈

StyleGAN2-ADA - Official PyTorch implementation

The Power of Scale for Parameter-Efficient Prompt Tuning

Diffgram - Supervised Learning Data Platform

DeRF: Decomposed Radiance Fields

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Fortuitous Forgetting in Connectionist Networks

A Simple Key-Value Data-store written in Python

Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

The CLRS Algorithmic Reasoning Benchmark

DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.