Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Overview

CenterPose

Overview

This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image" by Lin et al. (full citation below). In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, a single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark of real images, outperforming state-of-the-art methods for 3D IoU metric (27.6% higher than the single-stage approach of MobilePose and 7.1% higher than the related two-stage approach). The algorithm runs at 15 fps on an NVIDIA GTX 1080Ti GPU.

Installation

The code was tested on Ubuntu 16.04, with Anaconda Python 3.6 and PyTorch 1.1.0. Higher versions should be possible with some accuracy difference. NVIDIA GPUs are needed for both training and testing.

  1. Clone this repo:

    CenterPose_ROOT=/path/to/clone/CenterPose
    git clone https://github.com/NVlabs/CenterPose.git $CenterPose_ROOT
    
  2. Create an Anaconda environment or create your own virtual environment

    conda create -n CenterPose python=3.6
    conda activate CenterPose
    pip install -r requirements.txt
    conda install -c conda-forge eigenpy
    
  3. Compile the deformable convolutional layer

    git submodule init
    git submodule update
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    

    [Optional] If you want to use a higher version of PyTorch, you need to download the latest version of DCNv2 and compile the library.

    git submodule set-url https://github.com/jinfagang/DCNv2_latest.git src/lib/models/networks/DCNv2
    git submodule sync
    git submodule update --init --recursive --remote
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    
  4. Download our pre-trained models for CenterPose and move all the .pth files to $CenterPose_ROOT/models/CenterPose/. We currently provide models for 9 categories: bike, book, bottle, camera, cereal_box, chair, cup, laptop, and shoe.

  5. Prepare training/testing data

    We save all the training/testing data under $CenterPose_ROOT/data/.

    For the Objectron dataset, we created our own data pre-processor to extract the data for training/testing. Refer to the data directory for more details.

Demo

We provide supporting demos for image, videos, webcam, and image folders. See $CenterPose_ROOT/images/CenterPose

For category-level 6-DoF object estimation on images/video/image folders, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/image/or/folder/or/video --arch dlav1_34 --load_model ../path/to/model

You can also enable --debug 4 to save all the intermediate and final outputs.

For the webcam demo (You may want to specify the camera intrinsics via --cam_intrinsic), run

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dlav1_34 --load_model ../path/to/model

Training

We follow the approach of CenterNet for training the DLA network, reducing the learning rate by 10x after epoch 90 and 120, and stopping after 140 epochs.

For debug purposes, you can put all the local training params in the $CenterPose_ROOT/src/main_CenterPose.py script. You can also use the command line instead. More options are in $CenterPose_ROOT/src/lib/opts.py.

To start a new training job, simply do the following, which will use default parameter settings:

cd $CenterPose_ROOT/src
python main_CenterPose.py

The result will be saved in $CenterPose_ROOT/exp/object_pose/$dataset_$category_$arch_$time ,e.g., objectron_bike_dlav1_34_2021-02-27-15-33

You could then use tensorboard to visualize the training process via

cd $path/to/folder
tensorboard --logdir=logs --host=XX.XX.XX.XX

Evaluation

We evaluate our method on the Objectron dataset, please refer to the objectron_eval directory for more details.

Citation

Please cite grasp_primitiveShape if you use this repository in your publications:

@article{lin2021single,
  title={Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A and Birchfield, Stan},
  journal={arXiv preprint arXiv:2109.06161},
  year={2021}
}

Licence

CenterPose is licensed under the NVIDIA Source Code License - Non-commercial.

Owner
NVIDIA Research Projects
NVIDIA Research Projects
The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

DynamicNeuralGarments Introduction This repository contains the implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021. ./GarmentMoti

42 Dec 27, 2022
Code for "Learning to Regrasp by Learning to Place"

Learning2Regrasp Learning to Regrasp by Learning to Place, CoRL 2021. Introduction We propose a point-cloud-based system for robots to predict a seque

Shuo Cheng (成硕) 18 Aug 27, 2022
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
Learning to Draw: Emergent Communication through Sketching

Learning to Draw: Emergent Communication through Sketching This is the official code for the paper "Learning to Draw: Emergent Communication through S

19 Jul 22, 2022
Testing and Estimation of structural breaks in Stata

xtbreak estimating and testing for many known and unknown structural breaks in time series and panel data. For an overview of xtbreak test see xtbreak

Jan Ditzen 13 Jun 19, 2022
Transformers are Graph Neural Networks!

🚀 Gated Graph Transformers Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression. Associated article

Chaitanya Joshi 46 Jun 30, 2022
An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

Global-Wheat-Detection An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wh

Chuxin Wang 11 Sep 25, 2022
NeWT: Natural World Tasks

NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail

Visipedia 26 Oct 18, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

Aleph Alpha GmbH 331 Jan 03, 2023
AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

AsymmetricGAN for Image-to-Image Translation AsymmetricGAN Framework for Multi-Domain Image-to-Image Translation AsymmetricGAN Framework for Hand Gest

Hao Tang 42 Jan 15, 2022
A short and easy PyTorch implementation of E(n) Equivariant Graph Neural Networks

Simple implementation of Equivariant GNN A short implementation of E(n) Equivariant Graph Neural Networks for HOMO energy prediction. Just 50 lines of

Arsenii Senya Ashukha 97 Dec 23, 2022
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
Graph neural network message passing reframed as a Transformer with local attention

Adjacent Attention Network An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with

Phil Wang 49 Dec 28, 2022
Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Analysis of cross-lingual citations in English papers Contents initial_analysis Source code, data, and evaluation details as published at ICADL2020 ci

Tarek Saier 1 Oct 27, 2022
using STGCN to achieve egg classification task

EEG Classification   The task requires us to classify electroencephalography(EEG) into six categories, including human body, human face, animal body,

4 Jun 13, 2022
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
Christmas face app for Decathlon xmas coding party!

Christmas Face Application Use this library to create the perfect picture for your christmas cards! Done by Hasib Zunair, Guillaume Brassard and Samue

Hasib Zunair 4 Dec 20, 2021
[ICML 2021] A fast algorithm for fitting robust decision trees.

GROOT: Growing Robust Trees Growing Robust Trees (GROOT) is an algorithm that fits binary classification decision trees such that they are robust agai

Cyber Analytics Lab 17 Nov 21, 2022
DIR-GNN - Discovering Invariant Rationales for Graph Neural Networks

DIR-GNN "Discovering Invariant Rationales for Graph Neural Networks" (ICLR 2022)

Ying-Xin (Shirley) Wu 70 Nov 13, 2022