CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Related tags

Deep Learningcorenet
Overview

CoReNet

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objects live in a single consistent 3D coordinate frame relative to the camera, and they do not intersect in 3D. You can find more information in the following paper: CoReNet: Coherent 3D scene reconstruction from a single RGB image.

This repository contains source code, dataset pointers, and instructions for reproducing the results in the paper. If you find our code, data, or the paper useful, please consider citing

@InProceedings{popov20eccv,
  title="CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image",
  author="Popov, Stefan and Bauszat, Pablo and Ferrari, Vittorio", 
  booktitle="Computer Vision -- ECCV 2020",
  year="2020",
  doi="10.1007/978-3-030-58536-5_22"
}

Table of Contents

Installation

The code in this repository has been verified to work on Ubuntu 18.04 with the following dependencies:

# General APT packages
sudo apt install \
  python3-pip python3-virtualenv python python3.8-dev g++-8 \
  ninja-build git libboost-container-dev unzip

# NVIDIA related packages
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /"
sudo apt install \
    nvidia-driver-455 nvidia-utils-455 `#driver, CUDA+GL libraries, utils` \
    cuda-runtime-10-1 cuda-toolkit-10-2 libcudnn7 `# Cuda and CUDNN`

To install CoReNet, you need to clone the code from GitHub and create a python virtual environment.

# Clone CoReNet
mkdir -p ~/prj/corenet
cd ~/prj/corenet
git clone https://github.com/google-research/corenet.git .

# Setup a python virtual environment
python3.8 -m virtualenv --python=/usr/bin/python3.8 venv_38
. venv_38/bin/activate
pip install -r requirements.txt

All instructions below assume that CoReNet lives in ~/prj/corenet, that this is the current working directory, and that the virtual environment is activated. You can also run CoReNet using the supplied docker file: ~/prj/corenet/Dockerfile.

Datasets

The CoReNet paper introduced several datasets with synthetic scenes. To reproduce the experiments in the paper you need to download them, using:

cd ~/prj/corenet
mkdir -p ~/prj/corenet/data/raw
for n in single pairs triplets; do  
  for s in train val test; do
    wget "https://storage.googleapis.com/gresearch/corenet/${n}.${s}.tar" \
      -O "data/raw/${n}.${s}.tar" 
    tar -xvf "data/raw/${n}.${s}.tar" -C data/ 
  done 
done

For each scene, these datasets provide the objects placement, a good view point, and two images rendered from it with a varying degree of realism. To download the actual object geometry, you need to download ShapeNetCore.v2.zip from ShapeNet's original site, unpack it, and convert the 3D meshes to CoReNet's binary format:

echo "Please download ShapeNetCore.v2.zip from ShapeNet's original site and "
echo "place it in ~/prj/corenet/data/raw/ before running the commands below"

cd ~/prj/corenet
unzip data/raw/ShapeNetCore.v2.zip -d data/raw/
PYTHONPATH=src python -m preprocess_shapenet \
  --shapenet_root=data/raw/ShapeNetCore.v2 \
  --output_root=data/shapenet_meshes

Models from the paper

To help reproduce the results from the CoReNet paper, we offer 5 pre-trained models from it (h5, h7, m7, m9, and y1; details below and in the paper). You can download and unpack these using:

cd ~/prj/corenet
wget https://storage.googleapis.com/gresearch/corenet/paper_tf_models.tgz \
  -O data/raw/paper_tf_models.tgz
tar xzvf data/raw/paper_tf_models.tgz -C data/

You can evaluate the downloaded models against their respective test sets using:

MODEL=h7  # Set to one of: h5, h7, m7, m9, y1

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
tf_model_eval --config_path=configs/paper_tf_models/${MODEL}.json5

To run on multiple GPUs in parallel, set --nproc_per_node to the number of desired GPUs. You can use CUDA_VISIBLE_DEVICES to control which GPUs exactly to use. CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control the just-in-time compiler for the voxelization operation.

Upon completion, quantitative results will be stored in ~/prj/corenet/output/paper_tf_models/${MODEL}/voxel_metrics.csv. Qualitative results will be available in ~/prj/corenet/output/paper_tf_models/${MODEL}/ in the form of PNG files.

This table summarizes the model attributes and their performance. More details can be found in the paper.

model dataset realism native resolution mean IoU
h5 single low 128 x 128 x 128 57.9%
h7 single high 128 x 128 x 128 59.1%
y1 single low 32 x 32 x 32 53.3%
m7 pairs high 128 x 128 x 128 43.1%
m9 triplets high 128 x 128 x 128 43.9%

Note that all models are evaluated on a grid resolution of 128 x 128 x 128, independent of their native resolution (see section 3.5 in the paper). The performance computed with this code matches the one reported in the paper for h5, h7, m7, and m9. For y1, the performance here is slightly higher (+0.2% IoU), as we no longer have the exact checkpoint used in the paper.

You can also run these models on individual images interactively, using the corenet_demo.ipynb notebook. For this, you need to also pip install jupyter-notebook in your virtual environment.

Training and evaluating a new model

We offer PyTorch code for training and evaluating models. To train a model, you need to (once) import the starting ResNet50 checkpoint:

cd ~/prj/corenet
PYTHONPATH=src python -m import_resnet50_checkpoint

Then run:

MODEL=h7  # Set to one of: h5, h7, m7, m9 

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
train --config_path=configs/models/h7.json5

Again, use --nproc_per_node and CUDA_VISIBLE_DEVICES to control parallel execution on multiple GPUs, CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control just-in-time compilation.

You can also evaluate individual checkpoints, for example:

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 eval \
  --cpt_path=output/models/h7/cpt/persistent/state_000000000.cpt \
  --output_path=output/eval_cpt_example \
  --eval_names_regex="short.*" \
  -jq '(.. | .config? | select(.num_qualitative_results != null) | .num_qualitative_results) |= 4' \

The -jq option limits the number of qualitative results to 4 (see also Further details section)

We currently offer checkpoints trained with this code for models h5, h7, m7, and m9, in this .tgz. These checkpoints achieve slightly better performance than the paper (see table below). This is likely due to a different distributed training strategy (synchronous here vs. asynchronous in the paper) and a different ML framework (PyTorch vs. TensorFlow in the paper).

h5 h7 m7 m9
mean IoU 60.2% 61.6% 45.0% 46.9%

Further details

Configuration files

The evaluation and training scripts are configured using JSON5 files that map to the TfModelEvalPipeline and TrainPipeline dataclasses in src/corenet/configuration.py. You can find description of the different configuration options in code comments, starting from these two classes.

You can also modify the configuration on the fly, through jq queries, as well as defines that change entries in the string_templates section. For example, the following options change the number of workers, and the prefetch factor of the data loaders, as well as the location of the data and the output directories:

... \
-jq "'(.. | .data_loader? | select(. != null) | .num_data_workers) |= 12'" \
    "'(.. | .data_loader? | select(. != null) | .prefetch_factor) |= 4'" \
-D 'data_dir=gs://some_gcs_bucket/data' \
   'output_dir=gs://some_gcs_bucket/output/models'

Dataset statistics

The table below summarizes the number of scenes in each dataset

single pairs triplets
train 883084 319981 80000
val 127286 45600 11400
test 246498 91194 22798

Licenses

The code and the checkpoints are released under the Apache 2.0 License. The datasets, the documentation, and the configuration files are licensed under the Creative Commons Attribution 4.0 International License.

Owner
Google Research
Google Research
Python Auto-ML Package for Tabular Datasets

Tabular-AutoML AutoML Package for tabular datasets Tabular dataset tuning is now hassle free! Run one liner command and get best tuning and processed

Sagnik Roy 18 Nov 20, 2022
A simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

this is a simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

crispengari 5 Dec 09, 2021
tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

tmm_fast tmm_fast or transfer-matrix-method_fast is a lightweight package to speed up optical planar multilayer thin-film device computation. It is es

26 Dec 11, 2022
face property detection pytorch

This is the face property train code of project face-detection-project

i am x 2 Oct 18, 2021
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
Point cloud processing tool library.

Point Cloud ToolBox This point cloud processing tool library can be used to process point clouds, 3d meshes, and voxels. Environment python 3.7.5 Dep

ZhangXinyun 40 Dec 09, 2022
A Human-in-the-Loop workflow for creating HD images from text

A Human-in-the-Loop? workflow for creating HD images from text DALL·E Flow is an interactive workflow for generating high-definition images from text

Jina AI 2.5k Jan 02, 2023
The Submission for SIMMC 2.0 Challenge 2021

The Submission for SIMMC 2.0 Challenge 2021 challenge website Requirements python 3.8.8 pytorch 1.8.1 transformers 4.8.2 apex for multi-gpu nltk Prepr

5 Jul 26, 2022
Pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks."

alpha-GAN Unofficial pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks." arXi

Victor Shepardson 78 Dec 08, 2022
CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

Vadim 4 Sep 01, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

1 Dec 11, 2021
🎯 A comprehensive gradient-free optimization framework written in Python

Solid is a Python framework for gradient-free optimization. It contains basic versions of many of the most common optimization algorithms that do not

Devin Soni 565 Dec 26, 2022
Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Patrick Varilly 28 Nov 25, 2022
Code for "Long Range Probabilistic Forecasting in Time-Series using High Order Statistics"

Long Range Probabilistic Forecasting in Time-Series using High Order Statistics This is the code produced as part of the paper Long Range Probabilisti

16 Dec 06, 2022
Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation

NorCal Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation On Model Calibration for Long-Tailed Object Detec

Tai-Yu (Daniel) Pan 24 Dec 25, 2022
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

87 Dec 09, 2022
Repository sharing code and the model for the paper "Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes"

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes Setup virtualenv -p python3 venv source venv/bin/activate pip instal

Planet AI GmbH 9 May 20, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

212 Dec 25, 2022
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

#NeuralTalk Warning: Deprecated. Hi there, this code is now quite old and inefficient, and now deprecated. I am leaving it on Github for educational p

Andrej 5.3k Jan 07, 2023
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.

feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito

Trent Henderson 7 May 25, 2022