Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Related tags

Deep LearningVMNet
Overview

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Framework Fig

Created by Zeyu HU

Introduction

This work is based on our paper VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation, which appears at the IEEE International Conference on Computer Vision (ICCV) 2021.

In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters).

Citation

If you find our work useful in your research, please consider citing:

@misc{hu2021vmnet,
      title={VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation}, 
      author={Zeyu Hu and Xuyang Bai and Jiaxiang Shang and Runze Zhang and Jiayu Dong and Xin Wang and Guangyuan Sun and Hongbo Fu and Chiew-Lan Tai},
      year={2021},
      eprint={2107.13824},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Installation

  • Our code is based on Pytorch. Please make sure CUDA and cuDNN are installed. One configuration has been tested:

    • Python 3.7
    • Pytorch 1.4.0
    • torchvision 0.5.0
    • CUDA 10.0
    • cudatoolkit 10.0.130
    • cuDNN 7.6.5
  • VMNet depends on the torch-geometric and torchsparse libraries. Please follow their installation instructions. One configuration has been tested, higher versions should work as well:

    • torch-geometric 1.6.3
    • torchsparse 1.1.0
  • We adapted VCGlib to generate pooling trace maps for vertex clustering and quadric error metrics.

    git clone https://github.com/cnr-isti-vclab/vcglib
    
    # QUADRIC ERROR METRICS
    cd vcglib/apps/tridecimator/
    qmake
    make
    
    # VERTEX CLUSTERING
    cd ../sample/trimesh_clustering
    qmake
    make
    

    Please add vcglib/apps/tridecimator and vcglib/apps/sample/trimesh_clustering to your environment path variable.

  • Other dependencies. One configuration has been tested:

    • open3d 0.9.0
    • plyfile 0.7.3
    • scikit-learn 0.24.0
    • scipy 1.6.0

Data Preparation

  • Please refer to https://github.com/ScanNet/ScanNet and https://github.com/niessner/Matterport to get access to the ScanNet and Matterport dataset. Our method relies on the .ply as well as the .labels.ply files. We take ScanNet dataset as example for the following instructions.

  • Create directories to store processed data.

    • 'path/to/processed_data/train/'
    • 'path/to/processed_data/val/'
    • 'path/to/processed_data/test/'
  • Prepare train data.

    python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_train.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/train/
    
  • Prepare val data.

    python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_val.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/val/
    
  • Prepare test data.

    python prepare_data.py --test_split --considered_rooms_path dataset/data_split/scannetv2_test.txt --in_path path/to/ScanNet/scans_test --out_path path/to/processed_data/test/
    

Train

  • On train/val/test setting.

    CUDA_VISIBLE_DEVICES=0 python run.py --train --exp_name name_you_want --data_path path/to/processed_data
    
  • On train+val/test setting (for ScanNet benchmark).

    CUDA_VISIBLE_DEVICES=0 python run.py --train_benchmark --exp_name name_you_want --data_path path/to/processed_data
    

Inference

  • Validation. Pretrained model (73.3% mIoU on ScanNet Val). Please download and put into directory check_points/val_split.

    CUDA_VISIBLE_DEVICES=0 python run.py --val --exp_name val_split --data_path path/to/processed_data
    
  • Test. Pretrained model (74.6% mIoU on ScanNet Test). Please download and put into directory check_points/test_split. TxT files for benchmark submission will be saved in directory test_results/.

    CUDA_VISIBLE_DEVICES=0 python run.py --test --exp_name test_split --data_path path/to/processed_data
    

Acknowledgements

Our code is built upon torch-geometric, torchsparse and dcm-net.

License

Our code is released under MIT License (see LICENSE file for details).

Owner
HU Zeyu
HU Zeyu
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

567 Dec 26, 2022
Dynamic Environments with Deformable Objects (DEDO)

DEDO - Dynamic Environments with Deformable Objects DEDO is a lightweight and customizable suite of environments with deformable objects. It is aimed

Rika 32 Dec 22, 2022
Predictive Modeling on Electronic Health Records(EHR) using Pytorch

Predictive Modeling on Electronic Health Records(EHR) using Pytorch Overview Although there are plenty of repos on vision and NLP models, there are ve

81 Jan 01, 2023
Repository of Vision Transformer with Deformable Attention

Vision Transformer with Deformable Attention This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv]. Int

410 Jan 03, 2023
This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Description This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs) in [Gardy et

Ludovic Gardy 0 Feb 09, 2022
Benchmarks for semi-supervised domain generalization.

Semi-Supervised Domain Generalization This code is the official implementation of the following paper: Semi-Supervised Domain Generalization with Stoc

Kaiyang 49 Dec 10, 2022
Koç University deep learning framework.

Knet Knet (pronounced "kay-net") is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU

1.4k Dec 31, 2022
A PyTorch toolkit for 2D Human Pose Estimation.

PyTorch-Pose PyTorch-Pose is a PyTorch implementation of the general pipeline for 2D single human pose estimation. The aim is to provide the interface

Wei Yang 1.1k Dec 30, 2022
Rax is a Learning-to-Rank library written in JAX

🦖 Rax: Composable Learning to Rank using JAX Rax is a Learning-to-Rank library written in JAX. Rax provides off-the-shelf implementations of ranking

Google 247 Dec 27, 2022
Bayesian optimization in PyTorch

BoTorch is a library for Bayesian Optimization built on PyTorch. BoTorch is currently in beta and under active development! Why BoTorch ? BoTorch Prov

2.5k Dec 31, 2022
AoT is a system for automatically generating off-target test harness by using build information.

AoT: Auto off-Target Automatically generating off-target test harness by using build information. Brought to you by the Mobile Security Team at Samsun

Samsung 10 Oct 19, 2022
my graduation project is about live human face augmentation by projection mapping by using CNN

Live-human-face-expression-augmentation-by-projection my graduation project is about live human face augmentation by projection mapping by using CNN o

1 Mar 08, 2022
A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

A Light and Fast Face Detector for Edge Devices Big News: LFD, which is a big update of LFFD, now is released (2021.03.09). It is strongly recommended

YonghaoHe 1.3k Dec 25, 2022
The Multi-Mission Maximum Likelihood framework (3ML)

PyPi Conda The Multi-Mission Maximum Likelihood framework (3ML) A framework for multi-wavelength/multi-messenger analysis for astronomy/astrophysics.

The Multi-Mission Maximum Likelihood (3ML) 62 Dec 30, 2022
Deep Surface Reconstruction from Point Clouds with Visibility Information

Data, code and pretrained models for the paper Deep Surface Reconstruction from Point Clouds with Visibility Information.

Raphael Sulzer 23 Jan 04, 2023
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect"

Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect" by Michael Ne

M Nestor 1 Apr 19, 2022
This repo contains the source code and a benchmark for predicting user's utilities with Machine Learning techniques for Computational Persuasion

Machine Learning for Argument-Based Computational Persuasion This repo contains the source code and a benchmark for predicting user's utilities with M

Ivan Donadello 4 Nov 07, 2022
Simple streamlit app to demonstrate HERE Tour Planning

Table of Contents About the Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Acknowledgements About Th

Amol 8 Sep 05, 2022
LegoDNN: a block-grained scaling tool for mobile vision systems

Table of contents 1 Introduction 1.1 Major features 1.2 Architecture 2 Code and Installation 2.1 Code 2.2 Installation 3 Repository of DNNs in vision

41 Dec 24, 2022