[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Overview

FFB6D

This is the official source code for the CVPR2021 Oral work, FFB6D: A Full Flow Biderectional Fusion Network for 6D Pose Estimation. (Arxiv)

Table of Content

Introduction & Citation

FFB6D is a general framework for representation learning from a single RGBD image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from PVN3D(Arxiv, Code, Video). At the representation learning stage of FFB6D, we build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. In this way, the two networks can leverage local and global complementary information from the other one to obtain better representations. Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.

Please cite FFB6D & PVN3D if you use this repository in your publications:

@InProceedings{He_2021_CVPR,
author = {He, Yisheng and Huang, Haibin and Fan, Haoqiang and Chen, Qifeng and Sun, Jian},
title = {FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

@InProceedings{He_2020_CVPR,
author = {He, Yisheng and Sun, Wei and Huang, Haibin and Liu, Jianran and Fan, Haoqiang and Sun, Jian},
title = {PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Installation

  • Install CUDA 10.1 / 10.2

  • Set up python3 environment from requirement.txt:

    pip3 install -r requirement.txt 
  • Install apex:

    git clone https://github.com/NVIDIA/apex
    cd apex
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5"  # set the target architecture manually, suggested in issue https://github.com/NVIDIA/apex/issues/605#issuecomment-554453001
    pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    cd ..
  • Install normalSpeed, a fast and light-weight normal map estimator:

    git clone https://github.com/hfutcgncas/normalSpeed.git
    cd normalSpeed/normalSpeed
    python3 setup.py install --user
    cd ..
  • Install tkinter through sudo apt install python3-tk

  • Compile RandLA-Net operators:

    cd ffb6d/models/RandLA/
    sh compile_op.sh

Code Structure

[Click to expand]
  • ffb6d
    • ffb6d/common.py: Common configuration of dataset and models, eg. dataset path, keypoints path, batch size and so on.
    • ffb6d/datasets
      • ffb6d/datasets/ycb
        • ffb6d/datasets/ycb/ycb_dataset.py: Data loader for YCB_Video dataset.
        • ffb6d/datasets/ycb/dataset_config
          • ffb6d/datasets/ycb/dataset_config/classes.txt: Object list of YCB_Video dataset.
          • ffb6d/datasets/ycb/dataset_config/radius.txt: Radius of each object in YCB_Video dataset.
          • ffb6d/datasets/ycb/dataset_config/train_data_list.txt: Training set of YCB_Video datset.
          • ffb6d/datasets/ycb/dataset_config/test_data_list.txt: Testing set of YCB_Video dataset.
        • ffb6d/datasets/ycb/ycb_kps
          • ffb6d/datasets/ycb/ycb_kps/{obj_name}_8_kps.txt: ORB-FPS 3D keypoints of an object in the object coordinate system.
          • ffb6d/datasets/ycb/ycb_kps/{obj_name}_corners.txt: 8 corners of the 3D bounding box of an object in the object coordinate system.
    • ffb6d/models
      • ffb6d/models/ffb6d.py: Network architecture of the proposed FFB6D.
      • ffb6d/models/cnn
        • ffb6d/models/cnn/extractors.py: Resnet backbones.
        • ffb6d/models/cnn/pspnet.py: PSPNet decoder.
        • ffb6d/models/cnn/ResNet_pretrained_mdl: Resnet pretraiend model weights.
      • ffb6d/models/loss.py: loss calculation for training of FFB6D model.
      • ffb6d/models/pytorch_utils.py: pytorch basic network modules.
      • ffb6d/models/RandLA/: pytorch version of RandLA-Net from RandLA-Net-pytorch
    • ffb6d/utils
      • ffb6d/utils/basic_utils.py: basic functions for data processing, visualization and so on.
      • ffb6d/utils/meanshift_pytorch.py: pytorch version of meanshift algorithm for 3D center point and keypoints voting.
      • ffb6d/utils/pvn3d_eval_utils_kpls.py: Object pose esitimation from predicted center/keypoints offset and evaluation metrics.
      • ffb6d/utils/ip_basic: Image Processing for Basic Depth Completion from ip_basic.
      • ffb6d/utils/dataset_tools
        • ffb6d/utils/dataset_tools/DSTOOL_README.md: README for dataset tools.
        • ffb6d/utils/dataset_tools/requirement.txt: Python3 requirement for dataset tools.
        • ffb6d/utils/dataset_tools/gen_obj_info.py: Generate object info, including SIFT-FPS 3d keypoints, radius etc.
        • ffb6d/utils/dataset_tools/rgbd_rnder_sift_kp3ds.py: Render rgbd images from mesh and extract textured 3d keypoints (SIFT/ORB).
        • ffb6d/utils/dataset_tools/utils.py: Basic utils for mesh, pose, image and system processing.
        • ffb6d/utils/dataset_tools/fps: Furthest point sampling algorithm.
        • ffb6d/utils/dataset_tools/example_mesh: Example mesh models.
    • ffb6d/train_ycb.py: Training & Evaluating code of FFB6D models for the YCB_Video dataset.
    • ffb6d/demo.py: Demo code for visualization.
    • ffb6d/train_ycb.sh: Bash scripts to start the training on the YCB_Video dataset.
    • ffb6d/test_ycb.sh: Bash scripts to start the testing on the YCB_Video dataset.
    • ffb6d/demo_ycb.sh: Bash scripts to start the demo on the YCB_Video_dataset.
    • ffb6d/train_log
      • ffb6d/train_log/ycb
        • ffb6d/train_log/ycb/checkpoints/: Storing trained checkpoints on the YCB_Video dataset.
        • ffb6d/train_log/ycb/eval_results/: Storing evaluated results on the YCB_Video_dataset.
        • ffb6d/train_log/ycb/train_info/: Training log on the YCB_Video_dataset.
  • requirement.txt: python3 environment requirements for pip3 install.
  • figs/: Images shown in README.

Datasets

  • YCB-Video: Download the YCB-Video Dataset from PoseCNN. Unzip it and link the unzippedYCB_Video_Dataset to ffb6d/datasets/ycb/YCB_Video_Dataset:

    ln -s path_to_unzipped_YCB_Video_Dataset ffb6d/datasets/ycb/

Training and evaluating

Training on the YCB-Video Dataset

  • Start training on the YCB-Video Dataset by:

    # commands in train_ycb.sh
    n_gpu=8  # number of gpu to use
    python3 -m torch.distributed.launch --nproc_per_node=$n_gpu train_ycb.py --gpus=$n_gpu

    The trained model checkpoints are stored in train_log/ycb/checkpoints/

    A tip for saving GPU memory: you can open the mixed precision mode to save GPU memory by passing parameters opt_level=O1 to train_ycb.py. The document for apex mixed precision trainnig can be found here.

Evaluating on the YCB-Video Dataset

  • Start evaluating by:
    # commands in test_ycb.sh
    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar  # checkpoint to test.
    python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose # -debug
    You can evaluate different checkpoints by revising the tst_mdl to the path of your target model.
  • Pretrained model: We provide our pre-trained models on onedrive, here. Download the pre-trained model, move it to train_log/ycb/checkpoints/ and modify tst_mdl for testing.

Demo/visualization on the YCB-Video Dataset

  • After training your model or downloading the pre-trained model, you can start the demo by:
    # commands in demo_ycb.sh
    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
    python3 -m demo -checkpoint $tst_mdl -dataset ycb
    The visualization results will be stored in train_log/ycb/eval_results/pose_vis.

Results

  • Evaluation result without any post refinement on the YCB-Video dataset:

    PoseCNN PointFusion DenseFusion PVN3D Our FFF6D
    ADDS ADD(S) ADDS ADD(S) ADDS ADD(S) ADDS ADD(S) ADDS ADD(S)
    ALL 75.8 59.9 83.9 - 91.2 82.9 95.5 91.8 96.6 92.7
  • Evaluation result on the LineMOD dataset:

    RGB RGB-D
    PVNet CDPN DPOD PointFusion DenseFusion(iterative) G2L-Net PVN3D FFF6D
    MEAN 86.3 89.9 95.2 73.7 94.3 98.7 99.4 99.7
  • Robustness upon occlusion:

  • Model parameters and speed on the LineMOD dataset (one object / frame) with one 2080Ti GPU:
    Parameters Network Forward Pose Estimation All time
    PVN3D 39.2M 170ms 20ms 190ms
    FFF6D
    33.8M 57ms 18ms 75ms

Adaptation to New Dataset

  • Install and generate required mesh info following DSTOOL_README.

  • Modify info of your new dataset in FFB6D/ffb6d/common.py

  • Write your dataset preprocess script following FFB6D/ffb6d/datasets/ycb/ycb_dataset.py. Note that you should modify or call the function that get your model info, such as 3D keypoints, center points, and radius properly.

  • (Very Important!) Visualize and check if you process the data properly, eg, the projected keypoints and center point, the semantic label of each point, etc. For example, you can visualize the projected center point (red point) and selected keypoints (orange points) as follow by running python3 -m datasets.ycb.ycb_dataset.

  • For inference, make sure that you load the 3D keypoints, center point, and radius of your objects in the object coordinate system properly in FFB6D/ffb6d/utils/pvn3d_eval_utils.py.

  • Check that all setting are modified properly by using the ground truth information for evaluation. The result should be high and close to 100 if everything is correct. For example, testing ground truth on the YCB_Video dataset by passing -test_gt parameters to train_ycb.py will get results higher than 99.99:

    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
    python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose -test_gt
    

To Do

  • Scripts and pre-trained models for LineMOD dataset.

License

Licensed under the MIT License.

Owner
Yisheng (Ethan) He
Ph.D. student @ HKUST
Yisheng (Ethan) He
Simple Python application to transform Serial data into OSC messages

SerialToOSC-Bridge Simple Python application to transform Serial data into OSC messages. The current purpose is to be a compatibility layer between ha

Division of Applied Acoustics at Chalmers University of Technology 3 Jun 03, 2021
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 03, 2023
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
FS2KToolbox FS2K Dataset Towards the translation between Face

FS2KToolbox FS2K Dataset Towards the translation between Face -- Sketch. Download (photo+sketch+annotation): Google-drive, Baidu-disk, pw: FS2K. For

Deng-Ping Fan 5 Jan 03, 2023
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023
An implementation of the efficient attention module.

Efficient Attention An implementation of the efficient attention module. Description Efficient attention is an attention mechanism that substantially

Shen Zhuoran 194 Dec 15, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
2.86% and 15.85% on CIFAR-10 and CIFAR-100

Shake-Shake regularization This repository contains the code for the paper Shake-Shake regularization. This arxiv paper is an extension of Shake-Shake

Xavier Gastaldi 294 Nov 22, 2022
The Video-based Accident Detection System built in Python

Accident-detection-system About the Project This Repository contains the Video-based Accident Detection System built in Python. Contributors Yukta Gop

SURYAVANSHI SNEHAL BALKRISHNA 50 Dec 07, 2022
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Noam Eshed 34 Oct 02, 2022
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 269 Jan 02, 2023
Catalyst.Detection

Accelerated DL R&D PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentatio

Catalyst-Team 12 Oct 25, 2021
SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [Paper]  [Project Page]  [Interactive Demo]  [Supplementary Material]        Usag

215 Dec 25, 2022
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Building Shazam from scratch In this repository we tried to implement a simplified copy of the Shazam application able to tell you the name of a song

Arturo Ghinassi 0 Nov 17, 2022
Robust Self-augmentation for NER with Meta-reweighting

Robust Self-augmentation for NER with Meta-reweighting

Lam chi 17 Nov 22, 2022
public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

PlusLab 19 Oct 27, 2022
Large dataset storage format for Pytorch

H5Record Large dataset ( 100G, = 1T) storage format for Pytorch (wip) Support python 3 pip install h5record Why? Writing large dataset is still a

theblackcat102 43 Oct 22, 2022
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022