Autonomous Perception: 3D Object Detection with Complex-YOLO

Overview

Autonomous Perception: 3D Object Detection with Complex-YOLO

Gif of 50 frames of darknet

LiDAR object detection with Complex-YOLO takes four steps:

  1. Computing LiDAR point-clouds from range images.
  2. Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library (PCL).
  3. Using both Complex-YOLO Darknet and Resnet to predict 3D dectections on transformed LiDAR images.
  4. Evaluating the detections based Precision and Recall.

Complex-Yolo Pipeline

Complex-Yolo is both highly accurate and highly performant in production:

Complex-Yolo Performance

Computing LiDAR Point-Clouds from Waymo Range Images

Waymo uses multiple sensors including LiDAR, cameras, radar for autonomous perception. Even microphones are used to help detect ambulance and police sirens.

Visualizing LiDAR Range and Intensity Channels

LiDAR visualization 1

Roof-mounted "Top" LiDAR rotates 360 degrees with a vertical field of vision or ~20 degrees (-17.6 degrees to +2.4 degrees) with a 75m limit in the dataset.

LiDAR data is stored as a range image in the Waymo Open Dataset. Using OpenCV and NumPy, we filtered the "range" and "intensity" channels from the image, and converted the float data to 8-bit unsigned integers. Below is a visualization of two video frames, where the top half is the range channel, and the bottom half is the intensity for each visualization:

LiDAR visualization 2

Visualizing th LiDAR Point-cloud

There are 64 LEDs in Waymo's top LiDAR sensor. Limitations of 360 LiDAR include the space between beams (aka resolution) widening with distance from the origin. Also the car chasis will create blind spots, creating the need for Perimeter LiDAR sensors to be inlcuded on the sides of the vehicles.

We leveraged the Open3D library to make a 3D interactive visualization of the LiDAR point-cloud. Commonly visible features are windshields, tires, and mirros within 40m. Beyond 40m, cars are like slightly rounded rectangles where you might be able to make ou the windshield. Further away vehicles and extremely close vehicles typically have lower resolution, as well as vehicles obstructing the detection of other vehicles.

10 Vehicles Showing Different Types of LiDAR Interaction:

  1. Truck with trailer - most of truck is high resolution visible, but part of the trailer is in the 360 LiDAR's blind-spot.
  2. Car partial in blind spot, back-half isn't picked up well. This car blocks the larges area behind it from being detected by the LiDAR.
  3. Car shape is higly visible, where you can even see the side-mirrors and the LiDAR passing through the windshield.
  4. Car driving in other lane. You can see the resolution of the car being lower because the further away the 64 LEDs project the lasers, the futher apart the points of the cloud will be. It is also obstructed from some lasers by Car 2.
  5. This parked is unobstructed, but far enough away where it's difficult to make our the mirrors or the tires.
  6. Comparing this car to Car 3, you can see where most of the definition is either there or slightly worse, because it is further way.
  7. Car 7 is both far away and obstructed, so you can barely tell it's a car. It's basically a box with probably a windshield.
  8. Car 8 is similar to Car 6 on the right side, but obstructed by Car 6 on the left side.
  9. Car 9 is at the limit of the LiDAR's dataset's perception. It's hard to tell it's a car.
  10. Car 10 is at the limit of the LiDAR's perception, and is also obstructed by car 8.

Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library

Convert sensor coordinates to Bird's-Eye View map coordinates

The birds-eye view (BEV) of a LiDAR point-cloud is based on the transformation of the x and y coordinates of the points.

BEV map properties:

  • Height:

    H_{i,j} = max(P_{i,j} \cdot [0,0,1]T)

  • Intensity:

    I_{i,j} = max(I(P_{i,j}))

  • Density:

    D_{i,j} = min(1.0,\ \frac{log(N+1)}{64})

P_{i,j} is the set of points that falls into each cell, with i,j as the respective cell coordinates. N_{i,j} refers to the number of points in a cell.

Compute intensity layer of the BEV map

We created a BEV map of the "intensity" channel from the point-cloud data. We identified the top-most (max height) point with the same (x,y)-coordinates from the point-cloud, and assign the intensity value to the corresponding BEV map point. The data was normalized and outliers were removed until the features of interest were clearly visible.

Compute height layer of the BEV map

This is a visualization of the "height" channel BEV map. We sorted and pruned point-cloud data, normalizing the height in each BEV map pixel by the difference between max. and min.

Model-based Object Detection in BEV Image

We used YOLO3 and Resnet deep-learning models to doe 3D Object Detection. Complex-YOLO: Real-time 3D Object Detection on Point Clouds and Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds.

Extract 3D bounding boxes from model response

The models take a three-channel BEV map as an input, and predict the class about coordinates of objects (vehicles). We then transformed these BEV coordinates back to the vehicle coordinate-space to draw the bounding boxes in both images.

Transforming back to vehicle space

Below is a gif the of detections in action: Results from 50 frames of resnet detection

Performance Evaluation for Object Detection

Compute intersection-over-union between labels and detections

Based on the labels within the Waymo Open Dataset, your task is to compute the geometrical overlap between the bounding boxes of labels and detected objects and determine the percentage of this overlap in relation to the area of the bounding boxes. A default method in the literature to arrive at this value is called intersection over union, which is what you will need to implement in this task.

After detections are made, we need a set of metrics to measure our progress. Common classification metrics for object detection include:

TP, FN, FP

  • TP: True Positive - Predicts vehicle or other object is there correctly
  • TN: True Negative - Correctly predicts vehicle or object is not present
  • FP: False Positive - Dectects object class incorrectly
  • FN: False Negative - Didn't detect object class when there should be a dectection

One popular method of making these determinations is measuring the geometric overlap of bounding boxes vs the total area two predicted bounding boxes take up in an image, or th Intersecion over Union (IoU).

IoU formula

IoU for Complex-Yolo

Classification Metrics Based on Precision and Recall

After all the LiDAR and Camera data has been transformed, and the detections have been predicted, we calculate the following metrics for the bounding box predictions:

Formulas

  • Precision:

    \frac{TP}{TP + FP}

  • Recall:

    \frac{TP}{TP + FN}

  • Accuracy:

    \frac{TP + TN}{TP + TN + FP + FN}

  • Mean Average Precision:

    \frac{1}{n} \sum_{Recall_{i}}Precision(Recall_{i})

Precision and Recall Results Visualizations

Results from 50 frames: Results from 50 frames

Precision: .954 Recall: .921

Complex Yolo Paper

Owner
Thomas Dunlap
Machine Learning Engineer and Data Scientist with a focus on deep learning, computer vision, and robotics.
Thomas Dunlap
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

DeepHyper Team 214 Jan 08, 2023
Efficient 3D Backbone Network for Temporal Modeling

VoV3D is an efficient and effective 3D backbone network for temporal modeling implemented on top of PySlowFast. Diverse Temporal Aggregation and

102 Dec 06, 2022
Code for: https://berkeleyautomation.github.io/bags/

DeformableRavens Code for the paper Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks. Here is the

Daniel Seita 121 Dec 30, 2022
Uni-Fold: Training your own deep protein-folding models.

Uni-Fold: Training your own deep protein-folding models. This package provides and implementation of a trainable, Transformer-based deep protein foldi

DeepModeling 88 Jan 03, 2023
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation A pytorch-version implementation codes of paper:

11 Dec 13, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
MQBench Quantization Aware Training with PyTorch

MQBench Quantization Aware Training with PyTorch I am using MQBench(Model Quantization Benchmark)(http://mqbench.tech/) to quantize the model for depl

Ling Zhang 29 Nov 18, 2022
PyTorch-Geometric Implementation of MarkovGNN: Graph Neural Networks on Markov Diffusion

MarkovGNN This is the official PyTorch-Geometric implementation of MarkovGNN paper under the title "MarkovGNN: Graph Neural Networks on Markov Diffusi

HipGraph: High-Performance Graph Analytics and Learning 6 Sep 23, 2022
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these enviro

Google 1.5k Jan 02, 2023
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto

Marianne Joy Leano 1 Mar 15, 2022
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Xumin Yu 317 Dec 26, 2022
Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

EMI-FGSM This repository contains code to reproduce results from the paper: Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021) Xiaosen Wa

John Hopcroft Lab at HUST 10 Sep 26, 2022
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding Official Pytorch implementation of Negative Sample Matter

Multimedia Computing Group, Nanjing University 69 Dec 26, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 2k Jan 09, 2023
[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

ArbSR Pytorch implementation of "Learning A Single Network for Scale-Arbitrary Super-Resolution", ICCV 2021 [Project] [arXiv] Highlights A plug-in mod

Longguang Wang 229 Dec 30, 2022
The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks This repo is the official implementation of CVPR2021 paper: "Decoupled Dynamic Filter Networks". Introduction DDF is

F.S.Fire 180 Dec 30, 2022
Convolutional Neural Network for Text Classification in Tensorflow

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post. It is slightly simplified implementation of Kim's Convo

Denny Britz 5.5k Jan 02, 2023
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation This the repository for this paper. Find extensions of this w

Zhuoyuan Mao 14 Oct 26, 2022