Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Last update: Dec 09, 2022

Overview

End-to-End Optimization of Scene Layout

Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral)

Project site, Bibtex

For help contact afluo [a.t] andrew.cmu.edu or open an issue

Requirements
- Pytorch 1.2 (for everything)
- Neural 3D Mesh Renderer - daniilidis version (for scene refinement only) For numerical stability, please modify projection.py to remove the multiplication by 0. After the change L33, L34 looks like:
```
x__ = x_
y__ = y_ 
```
- Blender 2.79 (for 3D rendering of rooms only)
  - Please install numpy in Blender
- matplotlib
- numpy
- skimage (for SPADE based shading)
- imageio (for SPADE based shading)
- shapely (eval only)
- PyWavefront (for scene refinement only, loading of 3d meshes)
- PyMesh (for scene refnement only, remeshing of SUNCG objects)
- 1 Nvidia GPU

Download checkpoints here, download metadata here

Project structure
|-3d_SLN
  |-data
    |-suncg_dataset.py
      # Actual definition for the dataset object, makes batches of scene graphs
  |-metadata
    # SUNCG meta data goes here
    |-30_size_info_many.json
      # data about object size/volume, for 30/70 cutoff
    |-data_rot_train.json
      # Normalized object positions & rotations for training
    |-data_rot_val.json
      # For testing
    |-size_info_many.json
      # data about object size/volume, different cutoff
    |-valid_types.json
      # What object types we should use for making the scene graph
      # Caution when editing this, quite a bit is hard coded elsewhere
  |-models
    |-diff_render.py
      # Uses the Neural Mesh Renderer (Pytorch Version) to refine object positions
    |-graph.py
      # Graph network building blocks
    |-misc.py
      # Misc helper functions for the diff renderer
    |-Sg2ScVAE_model.py
      # Code to construct the VAE-graph network
    |-SPADE_related.py
      # Tools to construct SPADE VAE GAN (inference only)
  |-options
    # Global options
  |-render
    # Contains various "profiles" for Blender rendering
  |-testing
    # You must call batch_gen in test.py at least once
    # It will call into get_layouts_from_network in test_VAE.py
    # this will compute the posterior mean & std and cache it
    |-test_acc_mean_std.py
      # Contains helper functions to measure acc/l1/std 
    |-test_heatmap.py
      # Contains the functions *produce_heatmap* and *plot_heatmap*
      # The first function takes as input a verbally defined scene graph
        # If not provided, it uses a default scene graph with 5 objects
        # It will load weights for a VAE-graph network
        # Then load the computed posterior mean & std
        # And repeatedly sample from the given scene graph
        # Saves the results to a .pkl file
      # The second function will load a .pkl and plot them as heatmaps
    |-test_plot2d.py
      # Contains a function that uses matplotlib
      # Does NOT require SUNCG
      # Plots the objects using colors provided by ScanNet
    |-test_plot3d.py
      # Calls into the blender code in the ../render folder
      # Requires the SUNCG meshes
      # Requires Blender 2.79
      # Either uses the CPU (Blender renderer)
      # Or uses the GPU (Cycles renderer)
      # Loads a HDR texture (from HDRI Haven) for background
    |-test_SPADE_shade.py
      # Loads semantic maps & depth map, and produces RGB images using SPADE
    |-test_utils.py
      # Contains helper functions for testing
        # Of interest is the *get_sg_from_words* function
    |-test_VAE.py
  |-build_dataset_model.py
     # Constructs dataset & dataloader objects
     # Also constructs the VAE-graph network
  |-test.py
     # Provides functions which performs the following:
       # generation of layouts from scene graphs under the *batch_gen* argument
       # measure the accuracy of l1 loss, accuracy, std under the *measure_acc_l1_std* argument
       # draw the topdown heatmaps of layouts with a single scene graph under the *heat_map* argument
       # plot the topdown boxes of layouts with under the *draw_2d* argument
       # plot the viewer centric layouts using suncg meshes under the *draw_3d* argument
       # perform SPADE based shading of semantic+depth maps under the *gan_shade* argument
  |-train.py
     # Contains the training loop for the VAE-graph network
  |-utils.py
     # Contains various helper functions for:
       # managing network losses
       # make scene graphs from bounding boxes
       # load/write jsons
       # misc other stuff

Training the VAE-graph network (limited to 1 GPU):
python train.py
Testing the VAE-graph network:
First run python test.py --batch_gen at least once. This computes and caches a posterior for future sampling using the training set. It also generates a bunch of layouts using the test set.
To generate a heatmap:
python test.py --heat_map
You can either define your own scene graph (see the produce_heatmap function in testing/test_heatmap.py), if you do not provide one it will use the default one. The function will convert scene graphs defined using words into a format usable by the network.
To compute STD/L1/Acc:
python test.py --measure_acc_l1_std
To plot the scene from a top down view with ScanNet colors (doesn't requrie SUNCG):
python test.py --draw_2d
Please provide a (O+1 x 6) tensor of bounding boxes, and a (O+1,) tensor of rotations. The last object should be the bounding box of the room
To plot 3D
python test.py --draw_3d
This calls into test_plot3d.py, which in turn launched Blender, and executes render_caller.py, you can put in specific rooms by editing this file. The full rendering function is located in render_room_color.py.
To use a neural renderer to refine a room
python test.py --fine_tune Please select the indexes of the room in test.py. This will call into test_render_refine.py which uses the differentiable renderer located in diff_render.py. Learning rate, and loss types/weightings can be set in test_render_refine.py.
We set a manual seed for demonstration purposes, in practice please remove this.
To use SPADE to generate texture/shading/lighting for a room from semantic + depth
python test.py --gan_shade This will first call into semantic_depth_caller.py to produce the semantic and depth maps, then use SPADE to generate RGB images.

Citation

If you find this repo useful for your research, please consider citing the paper

@inproceedings{luo2020end,
  title={End-to-End Optimization of Scene Layout},
  author={Luo, Andrew and Zhang, Zhoutong and Wu, Jiajun and Tenenbaum, Joshua B},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3754--3763},
  year={2020}
}

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Related tags

Overview

End-to-End Optimization of Scene Layout

Citation

Owner

Andrew Luo

buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

Compare neural networks by their feature similarity

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Annotated notes and summaries of the TensorFlow white paper, along with SVG figures and links to documentation

NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

In this project, two programs can help you take full agvantage of time on the model training with a remote server

A program that can analyze videos according to the weights you select

This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

Models, datasets and tools for Facial keypoints detection

Meta-learning for NLP

Official implementation of NeurIPS'2021 paper TransformerFusion

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

A python package for generating, analyzing and visualizing building shadows

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Forecasting

Pytorch implementation of YOLOX、PPYOLO、PPYOLOv2、FCOS an so on.