Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Overview

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper]

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun Bao, Yinda Zhang

teaser pipeline

Introduction

This repo contains data generation, data preprocessing, training, testing, evaluation, visualization code of our ICCV 2021 paper.

Install

Install necessary tools and create conda environment (needs to install anaconda if not available):

sudo apt install xvfb ninja-build freeglut3-dev libglew-dev meshlab
conda env create -f environment.yaml
conda activate Pano3D
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
python project.py build
  • When running python project.py build, the script will run external/build_gaps.sh which requires password for sudo privilege for apt-get install. Please make sure you are running with a user with sudo privilege. If not, please reach your administrator for installation of these libraries and comment out the corresponding lines then run python project.py build.
  • If you encounter /usr/bin/ld: cannot find -lGL problem when building GAPS, please follow this issue.

Since the dataloader loads large number of variables, before training, please follow this to raise the open file descriptor limits of your system. For example, to permanently change the setting, edit /etc/security/limits.conf with a text editor and add the following lines:

*         hard    nofile      500000
*         soft    nofile      500000
root      hard    nofile      500000
root      soft    nofile      500000

Demo

Download the pretrained checkpoints of detector, layout estimation network, and other modules. Then unzip the folder out into the root directory of current project. Since the given checkpoints are trained with current version of our code, which is a refactored version, the results are slightly better than those reported in our paper.

Please run the following command to predict on the given example in demo/input with our full model:

CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/pano3d_igibson.yaml --model.scene_gcn.relation_adjust True --mode test

Or run without relation optimization:

CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/pano3d_igibson.yaml --mode test

The results will be saved to out/pano3d/<demo_id>. If nothing goes wrong, you should get the following results:

rgb.png visual.png
det3d.jpg render.png

Data preparation

Our data is rendered with iGibson. Here, we follow their Installation guide to download iGibson dataset, then render and preprocess the data with our code.

  1. Download iGibson dataset with:

    python -m gibson2.utils.assets_utils --download_ig_dataset
  2. Render panorama with:

    python -m utils.render_igibson_scenes --renders 10 --random_yaw --random_obj --horizon_lo --world_lo

    The rendered dataset should be in data/igibson/.

  3. Make models watertight and render/crop single object image:

    python -m utils.preprocess_igibson_obj --skip_mgn

    The processed results should be in data/igibson_obj/.

  4. (Optional) Before proceeding to the training steps, you could visualize dataset ground-truth of data/igibson/ with:

    python -m utils.visualize_igibson

    Results ('visual.png' and 'render.png') should be saved to folder of each camera like data/igibson/Pomaria_0_int/00007.

Training and Testing

Preparation

  1. We use the pretrained weights of Implicit3DUnderstanding for fine-tuning Bdb3d Estimation Network (BEN) and LIEN+LDIF. Please download the pretrained checkpoint and unzip it into out/total3d/20110611514267/.

  2. We use wandb for logging and visualizing experiments. You can follow their quickstart guide to sign up for a free account and login on your machine with wandb login. The training and testing results will be uploaded to your project "deeppanocontext".

  3. Hint: The <XXX_id> in the commands bellow needs to be replaced with the XXX_id trained in the previous steps.

  4. Hint: In the steps bellow, when training or testing with main.py, you can override yaml configurations with command line parameter:

    CUDA_VISIBLE_DEVICES=0 python main.py configs/layout_estimation_igibson.yaml --train.epochs 100

    This might be helpful when debugging or tuning hyper-parameters.

First Stage

2D Detector

  1. Train 2D detector (Mask RCNN) with:

    CUDA_VISIBLE_DEVICES=0 python train_detector.py

    The trained weights will be saved to out/detector/detector_mask_rcnn

  2. (Optional) When training 2D detector, you could visualize the training process with:

    tensorboard --logdir out/detector/detector_mask_rcnn --bind_all --port 6006
  3. (Optional) Evaluate with:

    CUDA_VISIBLE_DEVICES=0 python test_detector.py

    The results will be saved to out/detector/detector_mask_rcnn/evaluation_{train/test}. Alternatively, you can visualize the prediction results on test set with:

     CUDA_VISIBLE_DEVICES=0 python test_detector.py --visualize --split test

    The visualization will be saved to the folder where the model weights file is.

  4. (Optional) Visualize BFoV detection results:

    CUDA_VISIBLE_DEVICES=0 python main.py configs/detector_2d_igibson.yaml --mode qtest --log.vis_step 1

    The visualization will be saved to out/detector/<detector_test_id>

Layout Estimation

Train layout estimation network (HorizonNet) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/layout_estimation_igibson.yaml

The checkpoint and visualization results will be saved to out/layout_estimation/<layout_estimation_id>/model_best.pth

Save First Stage Outputs

  1. Save predictions of 2D detector and LEN as dateset for stage 2 training:

    CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/first_stage_igibson.yaml --mode qtest --weight out/layout_estimation/<layout_estimation_id>/model_best.pth

    The first stage outputs should be saved to data/igibson_stage1

  2. (Optional) Visualize stage 1 dataset with:

    python -m utils.visualize_igibson --dataset data/igibson_stage1 --skip_render

Second Stage

Object Reconstruction

Train object reconstruction network (LIEN+LDIF) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/ldif_igibson.yaml

The checkpoint and visualization results will be saved to out/ldif/<ldif_id>.

Bdb3D Estimation

Train bdb3d estimation network (BEN) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/bdb3d_estimation_igibson.yaml

The checkpoint and visualization results will be saved to out/bdb3d_estimation/<bdb3d_estimation_id>.

Relation SGCN

  1. Train Relation SGCN without relation branch:

    CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --model.scene_gcn.output_relation False --model.scene_gcn.loss BaseLoss --weight out/bdb3d_estimation/<bdb3d_estimation_id>/model_best.pth out/ldif/<ldif_id>/model_best.pth

    The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_wo_rel_id>.

  2. Train Relation SGCN with relation branch:

    CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_wo_rel_id>/model_best.pth --train.epochs 20 

    The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_id>.

  3. Fine-tune Relation SGCN end-to-end with relation optimization:

    CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_id>/model_best.pth --model.scene_gcn.relation_adjust True --train.batch_size 1 --val.batch_size 1 --device.num_workers 2 --train.freeze shape_encoder shape_decoder --model.scene_gcn.loss_weights.bdb3d_proj 1.0 --model.scene_gcn.optimize_steps 20 --train.epochs 10

    The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_ro_id>.

Test Full Model

Run:

CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_ro_id>/model_best.pth --log.path out/relation_scene_gcn --resume False --finetune True --model.scene_gcn.relation_adjust True --mode qtest --model.scene_gcn.optimize_steps 100

The visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_ro_test_id>.

Citation

If you find our work and code helpful, please consider cite:

@misc{zhang2021deeppanocontext,
      title={DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization}, 
      author={Cheng Zhang and Zhaopeng Cui and Cai Chen and Shuaicheng Liu and Bing Zeng and Hujun Bao and Yinda Zhang},
      year={2021},
      eprint={2108.10743},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@InProceedings{Zhang_2021_CVPR,
    author    = {Zhang, Cheng and Cui, Zhaopeng and Zhang, Yinda and Zeng, Bing and Pollefeys, Marc and Liu, Shuaicheng},
    title     = {Holistic 3D Scene Understanding From a Single Image With Implicit Representation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {8833-8842}
}

We thank the following great works:

  • Total3DUnderstanding for their well-structured code. We construct our network based on their well-structured code.
  • Coop for their dataset. We used their processed dataset with 2D detector prediction.
  • LDIF for their novel representation method. We ported their LDIF decoder from Tensorflow to PyTorch.
  • Graph R-CNN for their scene graph design. We adopted their GCN implemention to construct our SGCN.
  • Occupancy Networks for their modified version of mesh-fusion pipeline.

If you find them helpful, please cite:

@InProceedings{Nie_2020_CVPR,
author = {Nie, Yinyu and Han, Xiaoguang and Guo, Shihui and Zheng, Yujian and Chang, Jian and Zhang, Jian Jun},
title = {Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
@inproceedings{huang2018cooperative,
  title={Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation},
  author={Huang, Siyuan and Qi, Siyuan and Xiao, Yinxue and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  booktitle={Advances in Neural Information Processing Systems},
  pages={206--217},
  year={2018}
}	
@inproceedings{genova2020local,
    title={Local Deep Implicit Functions for 3D Shape},
    author={Genova, Kyle and Cole, Forrester and Sud, Avneesh and Sarna, Aaron and Funkhouser, Thomas},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={4857--4866},
    year={2020}
}
@inproceedings{yang2018graph,
    title={Graph r-cnn for scene graph generation},
    author={Yang, Jianwei and Lu, Jiasen and Lee, Stefan and Batra, Dhruv and Parikh, Devi},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    pages={670--685},
    year={2018}
}
@inproceedings{mescheder2019occupancy,
  title={Occupancy networks: Learning 3d reconstruction in function space},
  author={Mescheder, Lars and Oechsle, Michael and Niemeyer, Michael and Nowozin, Sebastian and Geiger, Andreas},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4460--4470},
  year={2019}
}
Owner
Cheng Zhang
Cheng Zhang of UESTC 电子科技大学 通信学院 章程
Cheng Zhang
Godot RL Agents is a fully Open Source packages that allows video game creators

Godot RL Agents The Godot RL Agents is a fully Open Source packages that allows video game creators, AI researchers and hobbiest the opportunity to le

Edward Beeching 326 Dec 30, 2022
Main Results on ImageNet with Pretrained Models

This repository contains Pytorch evaluation code, training code and pretrained models for the following projects: SPACH (A Battle of Network Structure

Microsoft 151 Dec 14, 2022
python 93% acc. CNN Dogs Vs Cats ( Pytorch )

English | 简体中文(测试中...敬请期待) Cnn-Classification-Dog-Vs-Cat 猫狗辨别 (pytorch版本) CNN Resnet18 的猫狗分类器,基于ResNet及其变体网路系列,对于一般的图像识别任务表现优异,模型精准度高达93%(小型样本)。 项目制作于

apple ye 1 May 22, 2022
End-to-end speech secognition toolkit

End-to-end speech secognition toolkit This is an E2E ASR toolkit modified from Espnet1 (version 0.9.9). This is the official implementation of paper:

Jinchuan Tian 147 Dec 28, 2022
Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

Tensorflow Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot of practice and contributi

Mahmoud G. Salem 3.6k Dec 22, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
Script for getting information in discord

User-info.py Script for getting information in https://discord.com/ Instalação: apt-get update -y apt-get upgrade -y apt-get install git pkg install

Moleey 1 Dec 18, 2021
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Evolution Strategies in PyTorch

Evolution Strategies This is a PyTorch implementation of Evolution Strategies. Requirements Python 3.5, PyTorch = 0.2.0, numpy, gym, universe, cv2 Wh

Andrew Gambardella 333 Nov 14, 2022
Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection". LRPDenseNet.py

Pedro Ricardo Ariel Salvador Bassi 2 Sep 21, 2022
Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

Photogrammetry & Robotics Bonn 394 Dec 29, 2022
Data cleaning, missing value handle, EDA use in this project

Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas

Dhruvil Sheth 1 Jan 05, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案 团队:我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一 前言 19年开始,陆陆续续参加了一些比赛,拿到过一些top,比较懒一直都没分享过,这次比较幸运又拿了top1,打算分享下 分类的任务

shaochenjie 131 Dec 31, 2022
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

yobi byte 29 Oct 09, 2022
Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

Seamless Manga Inpainting with Semantics Awareness [SIGGRAPH 2021](To appear) | Project Website | BibTex Introduction: Manga inpainting fills up the d

101 Jan 01, 2023
Flexible time series feature extraction & processing

tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data. Useful

PreDiCT.IDLab 206 Dec 28, 2022
TipToiDog - Tip Toi Dog With Python

TipToiDog Was ist dieses Projekt? Meine 5-jährige Tochter spielt sehr gerne das

1 Feb 07, 2022
Deep Image Matting implementation in PyTorch

Deep Image Matting Deep Image Matting paper implementation in PyTorch. Differences "fc6" is dropped. Indices pooling. "fc6" is clumpy, over 100 millio

Yang Liu 724 Dec 27, 2022
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022