Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Related tags

Computer VisionROCA
Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Code release of our paper ROCA. Check out our video, paper, and website!

If you find our paper or this repository helpful, please cite:

@article{gumeli2022roca,
  title={ROCA: Robust CAD Model Retrieval and Alignment from a Single Image},
  author={G{\"u}meli, Can and Dai, Angela and Nie{\ss}ner, Matthias},
  booktitle={Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
  year={2022}
}

Development Environment

We use the following development environment for this project:

  • Nvidia RTX 3090 GPU
  • Intel Xeon W-1370
  • Ubuntu 20.04
  • CUDA Version 11.2
  • cudatoolkit 11.0
  • Pytorch 1.7
  • Pytorch3D 0.5 or 0.6
  • Detectron2 0.3

Installation

This code is developed using anaconda3 with Python 3.8 (download here), therefore we recommend a similar setup.

You can simply run the following code in the command line to create the development environment:

$ source setup.sh

For visualizing some demo results or using the data preprocessing code, you need our custom rasterizer. In case the provided x86-64 linux shared object does not work for you, you may install the rasterizer here.

Running the Demo

We provide four sample input images in network/assets folder. The images are captured with a smartphone and then preprocessed to be compatible with ROCA format. To run the demo, you first need to download data and config from this Google Drive folder. Models folder contains the pre-trained model and used config, while Data folder contains images and dataset.

Assuming contents of the Models directory are in $MODEL_DIR and contents of the Data directory are in $DATA_DIR, you can run:

$ cd network
$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml

You will see image overlay and CAD visualization are displayed one by one. Open3D mesh visualization is an interactive window where you can see geometries from different viewpoints. Close the Open3D window to continue to the next visualization. You will see similar results to the image above.

For headless visualization, you can specify an output directory where resulting images and meshes are placed:

$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml --output_dir $OUTPUT_DIR

You may use the --wild option to visualize results with "wild retrieval". Note that we omit the table category in this case due to large size diversity.

Preparing Data

Downloading Processed Data (Recommended)

We provide preprocessed images and labels in this Google Drive folder. Download and extract all folders to a desired location before running the training and evaluation code.

Rendering Data

Alternatively, you can render data yourself. Our data preparation code lives in the renderer folder.

Our project depends on ShapeNet (Chang et al., '15), ScanNet (Dai et al. '16), and Scan2CAD (Avetisyan et al. '18) datasets. For ScanNet, we use ScanNet25k images which are provided as a zip file via the ScanNet download script.

Once you get the data, check renderer/env.sh file for the locations of different datasets. The meanings of environment variables are described as inline comments in env.sh.

After editing renderer/env.sh, run the data generation script:

$ cd renderer
$ sh run.sh

Please check run.sh to see how individual scripts are running for data preprocessing and feel free to customize the data pipeline!

Training and Evaluating Models

Our training code lives in the network directory. Navigate to the network/env.sh and edit the environment variables. Make sure data directories are consistent with the ones locations downloaded and extracted folders. If you manually prepared data, make sure locations in /network/env.sh are consistent with the variables set in renderer/env.sh.

After you are done with network/env.sh, run the run.sh script to train a new model or evaluate an existing model based on the environment variables you set in env.sh:

$ cd network
$ sh run.sh

Replicating Experiments from the Main Paper

Based on the configurations in network/env.sh, you can run different ablations from the paper. The default config will run the (final) experiment. You can do the following edits cumulatively for different experiments:

  1. For P+E+W+R, set RETRIEVAL_MODE=resnet_resnet+image
  2. For P+E+W, set RETRIEVAL_MODE=nearest
  3. For P+E, set NOC_WEIGHTS=0
  4. For P, set E2E=0

Resources

To get the datasets and gain further insight regarding our implementation, we refer to the following datasets and open-source codebases:

Datasets and Metadata

Libraries

Projects

Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 06, 2023
Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Mauricio Villegas 27 Jun 20, 2022
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
Text Detection from images using OpenCV

EAST Detector for Text Detection OpenCV’s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel

Abhishek Singh 88 Oct 20, 2022
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

NoFaceDB 3 Sep 06, 2021
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.

Roboflow 52 Dec 23, 2022
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 04, 2022
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 03, 2023
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Owela Club A Django project for playing the Namibian game of Owela against a dumb AI. Built following the rules described on the Mancala World wiki pa

Adam Johnson 18 Jun 01, 2022
Motion Detection Squid Game with OpenCV Python

*Motion Detection Squid Game with OpenCV Python i am newbie in python. In this project I made a simple game to follow the trend about the red light gr

Nayan 17 Nov 22, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022