Rendering Point Clouds with Compute Shaders

Overview

Compute Shader Based Point Cloud Rendering

This repository contains the source code to our techreport:
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Markus Schütz, Bernhard Kerbl, Michael Wimmer. (not peer-reviewed, currently in submission)

  • Compute shaders can render point clouds up to an order of magnitude faster than GL_POINTS.
  • With a combination of warp-wide deduplication and early-z, compute shaders able to render 796 million points (12.7GB) at stable 62 to 64 frames per second in various different viewpoints on an RTX 3090. This corresponds to a memory bandwidth utilization of about 802GB/s, or a throughput of about 50 billion points per second.
  • The vertex order also strongly affects the performance. Some locality of points that are consecutive in memory is beneficial, but excessive locality can result in drastic slowdowns if it leads to thousands of GPU threads attempting to update a single pixel. As such, neither Morton ordered nor shuffled buffers are optimal. However combining both by first sorting by Morton code, and then shuffling batches of 128 points but leaving points within a batch in order, results in an improved ordering that ensures high performance with our compute approaches, and it also increases the performance of GL_POINTS by up to 5 times.

About the Framework

This framework is written in C++ and JavaScript (using V8). Most of the rendering is done in JavaScript with bindings to OpenGL 4.5 functions. It is written with live-coding in mind, so many javascript files are immediately executed at runtime as soon as they are saved by any text editor. As such, code has to be written with repeated execution in mind.

Getting Started

  • Compile Skye.sln project with Visual Studio.
  • Open the workspace in vscode.
  • Open "load_pointcloud.js" (quick search files via ctrl + e).
    • Adapt the path to the correct location of the las file.
    • Adapt position and lookAt to a viewpoint that fits your point cloud.
    • Change window.x to something that fits your monitor setup, e.g., 0 if you've got a single monitor, or 2540 if you've got two monitors and your first one has a with of 2540 pixels.
  • Press "Ctrl + Shift + B" to start the app. You should be seing an empty green window. (window.x is not yet applied)
  • Once you save "load_pointcloud.js" via ctrl+s, it will be executed, the window will be repositioned, and the point cloud will be loaded.
  • You can change position and lookAt at runtime and apply them by simply saving load_pointcloud.js again. The pointcloud will not be loaded again - to do so, you'll need to restart first.

After loading the point cloud, you should be seeing something like the screenshot below. The framework includes an IMGUI window with frame times, and another window that lets you switch between various rendering methods. Best try with data sets with tens of millions or hundreds of millions of points!

sd

Code Sections

Code for the individual rendering methods is primarily found in the modules/compute_<methods> folders.

Method Location
atomicMin ./modules/compute
reduce ./modules/compute_ballot
early-z ./modules/compute_earlyDepth
reduce & early-z ./modules/compute_ballot_earlyDepth
dedup ./modules/compute_ballot_earlyDepth_dedup
HQS ./modules/compute_hqs
HQS1R ./modules/compute_hqs_1x64bit_fast
busy-loop ./modules/compute_guenther
just-set ./modules/compute_just_set

Results

Frame times when rendering 796 million points on an RTX 3090 in a close-up viewpoint. Times in milliseconds, lower is better. The compute methods reduce (with early-z) and dedup (with early-z) yield the best results with Morton order (<16.6ms, >60fps). The shuffled Morton order greatly improves performance of GL_POINTS and some compute methods, and it is usually either the fastest or within close margins of the fastest combinations of rendering method and ordering.

Not depicted is that the dedup method is the most stable approach that continuously maintains >60fps in all viewpoints, while the performance of the reduce method varies and may drop to 50fps in some viewpoints. As such, we would recomend to use dedup in conjunction with Morton order if the necessary compute features are available, and reduce (with early-z) for wider support.

Comments
  • Can provide some dataset to test the demo (like default Candi Banyunibo data set)?

    Can provide some dataset to test the demo (like default Candi Banyunibo data set)?

    Hi, just found this paper & project via graphics weekly news.. just compiled the demo, and seems uses that dataset by default (banyunibo_inside_morton).. anyway to obtain that dataset? if not can you provide some download link to some huge & equivalent data set used by the demo like retz,eclepens,etc.. are this datasets under non "open" licenses? just wanted to test performance on my Titan V compared to a 3090.. thanks..

    opened by oscarbg 11
  • invisible window

    invisible window

    If I compile in DEBUG (VS2019) it crashes in void V8Helper::setupGL in this line: setupV8GLExtBindings(tpl);

    If I compile in RELEASE it compiles and the console shows no error but the windows is invisible, I can only see the console (and if I click keys I see them in the log).

    I have a 1070

    opened by jagenjo 3
  • A question of render.cs

    A question of render.cs

    image in render.cs there's vec2 variable called imgPos, i don't know why it times 0.5 and plus 0.5, what does it mean? Thank you very much if you can answer my questions. : )

    opened by UMR19 2
  • Questions about point cloud display when zoom in

    Questions about point cloud display when zoom in

    Hi, Thanks for your excellent job, just i compiled the demo and modified the setting to load my point cloud, it loaded successfully and have a better performance. but when i roll the mouse wheel to zoom in to look at the detail, lots of point missed, but when i use potree to display my point cloud, it display perfect. I am a beginner in computer graphics,may be it's point size is too small? image In the potree image

    Thanks for you reply:)

    opened by UMR19 0
  • ssRG and ssBA not bound to resolve?

    ssRG and ssBA not bound to resolve?

    In https://github.com/m-schuetz/compute_rasterizer/blob/master/compute_hqs/render.js

    Both buffers are bound in the render attribute pass, but neither is explicitly bound in the resolve pass. Always assumed that bindBufferBase affects the currently bound shader, but apparently the binding caries over to the next shader? Just a reminder to look this up to clarify my understanding of how they work.

    gl.bindBufferBase(gl.SHADER_STORAGE_BUFFER, 3, ssRG);
    gl.bindBufferBase(gl.SHADER_STORAGE_BUFFER, 4, ssBA);
    
    opened by m-schuetz 0
Releases(build_laz_crf)
Owner
Markus Schütz
Markus Schütz
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

Yue Cao 51 Nov 22, 2022
TSIT: A Simple and Versatile Framework for Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation This repository provides the official PyTorch implementation for the following p

Liming Jiang 255 Nov 23, 2022
RTSeg: Real-time Semantic Segmentation Comparative Study

Real-time Semantic Segmentation Comparative Study The repository contains the official TensorFlow code used in our papers: RTSEG: REAL-TIME SEMANTIC S

Mennatullah Siam 592 Nov 18, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
Age and Gender prediction using Keras

cnn_age_gender Age and Gender prediction using Keras Dataset example : Description : UTKFace dataset is a large-scale face dataset with long age span

XN3UR0N 58 May 03, 2022
1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around 79 Oct 08, 2022

Improving 3D Object Detection with Channel-wise Transformer

"Improving 3D Object Detection with Channel-wise Transformer" Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v

Hualian Sheng 107 Dec 20, 2022
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022
Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

acLSTM_motion This folder contains an implementation of acRNN for the CMU motion database written in Pytorch. See the following links for more backgro

Yi_Zhou 61 Sep 07, 2022
Laplace Redux -- Effortless Bayesian Deep Learning

Laplace Redux - Effortless Bayesian Deep Learning This repository contains the code to run the experiments for the paper Laplace Redux - Effortless Ba

Runa Eschenhagen 28 Dec 07, 2022
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

Jingyun Liang 2.4k Jan 05, 2023
Deep Learning Pipelines for Apache Spark

Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra

Databricks 2k Jan 08, 2023
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
Efficient 3D human pose estimation in video using 2D keypoint trajectories

3D human pose estimation in video with temporal convolutions and semi-supervised training This is the implementation of the approach described in the

Meta Research 3.1k Dec 29, 2022
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 06, 2023
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

43 Nov 21, 2022
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

SoftwareTask2 Repo for 2021 SDD assessment task 2, by Felix, Anna, and James. File/folder structure: helloworld.py - demonstrates various map backgrou

3 Dec 13, 2022