Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Overview

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

report report

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation,
Zicong Fan, Adrian Spurr, Muhammed Kocabas, Siyu Tang, Michael J. Black, Otmar Hilliges International Conference on 3D Vision (3DV), 2021

Image

Features

DIGIT estimates the 3D poses of two interacting hands from a single RGB image. This repo provides the training, evaluation, and demo code for the project in PyTorch Lightning.

Updates

  • November 25 2021: Initial repo with training and evaluation on PyTorch Lightning 0.9.

Setting up environment

DIGIT has been implemented and tested on Ubuntu 18.04 with python >= 3.7, PyTorch Lightning 0.9 and PyTorch 1.6.

Clone the repo:

git clone https://github.com/zc-alexfan/digit-interacting

Create folders needed:

make folders

Install conda environment:

conda create -n digit python=3.7
conda deactivate
conda activate digit
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt

Downloading InterHand2.6M

  • Download the 5fps.v1 of InterHand2.6M, following the instructions here
  • Place annotations, images, and rootnet_output from InterHand2.6M under ./data/InterHand/*:
./data/InterHand
├── annotations
├── images
│   ├── test
│   ├── train
│   └── val
├── rootnet_output
│   ├── rootnet_interhand2.6m_output_all_test.json
│   └── rootnet_interhand2.6m_output_machine_annot_val.json
|-- annotations
|-- images
|   |-- test
|   |-- train
|   `-- val
`-- rootnet_output
    |-- rootnet_interhand2.6m_output_test.json
    `-- rootnet_interhand2.6m_output_val.json
  • The folder ./data/InterHand/annotations should look like this:
./data/InterHand/annotations
|-- skeleton.txt
|-- subject.txt
|-- test
|   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_test_camera.json
|   |-- InterHand2.6M_test_data.json
|   `-- InterHand2.6M_test_joint_3d.json
|-- train
|   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_train_camera.json
|   |-- InterHand2.6M_train_data.json
|   `-- InterHand2.6M_train_joint_3d.json
`-- val
    |-- InterHand2.6M_val_MANO_NeuralAnnot.json
    |-- InterHand2.6M_val_camera.json
    |-- InterHand2.6M_val_data.json
    `-- InterHand2.6M_val_joint_3d.json

Preparing data and backbone for training

Download the ImageNet-pretrained backbone from here and place it under:

./saved_models/pytorch/imagenet/hrnet_w32-36af842e.pt

Package images into lmdb:

cd scripts
python package_images_lmdb.py

Preprocess annotation:

python preprocess_annot.py

Render part segmentation masks:

  • Following the README.md of render_mano_ih to prepare an LMDB of part segmentation. For question in preparing the segmentation masks, please keep issues in there.

Place the LMDB from the images, the segmentation masks, and meta_dict_*.pkl to ./data/InterHand and it should look like the structure below. The cache files meta_dict_*.pkl are by-products of the step above.

|-- annotations
|   |-- skeleton.txt
|   |-- subject.txt
|   |-- test
|   |   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_test_camera.json
|   |   |-- InterHand2.6M_test_data.json
|   |   |-- InterHand2.6M_test_data.pkl
|   |   `-- InterHand2.6M_test_joint_3d.json
|   |-- train
|   |   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_train_camera.json
|   |   |-- InterHand2.6M_train_data.json
|   |   |-- InterHand2.6M_train_data.pkl
|   |   `-- InterHand2.6M_train_joint_3d.json
|   `-- val
|       |-- InterHand2.6M_val_MANO_NeuralAnnot.json
|       |-- InterHand2.6M_val_camera.json
|       |-- InterHand2.6M_val_data.json
|       |-- InterHand2.6M_val_data.pkl
|       `-- InterHand2.6M_val_joint_3d.json
|-- cache
|   |-- meta_dict_test.pkl
|   |-- meta_dict_train.pkl
|   `-- meta_dict_val.pkl
|-- images
|   |-- test
|   |-- train
|   `-- val
|-- rootnet_output
|   |-- rootnet_interhand2.6m_output_test.json
|   `-- rootnet_interhand2.6m_output_val.json
`-- segm_32.lmdb

Training and evaluating

To train DIGIT, run the command below. The script runs at a batch size of 64 using accumulated gradient where each iteration is on a batch size 32:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit train --precision 16 --eval_every_epoch 2 --lr_dec_epoch 40 --max_epoch 50 --min_epoch 50

OR if you just want to do a sanity check you can run:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit minitrain --valsplit minival --precision 16 --eval_every_epoch 1 --max_epoch 50 --min_epoch 50

Each time you run train.py, it will create a new experiment under logs and each experiment is assigned a key.

Supposed your experiment key is 2e8c5136b, you can evaluate the last epoch of the model on the test set by:

python test.py --eval_on minitest --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

OR

python test.py --eval_on test --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

The former only does the evaluation 1000 images for a sanity check.

Similarly, you can evaluate on the validation set:

python test.py --eval_on val --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

Visualizing and evaluating pre-trained DIGIT

Here we provide instructions to show qualitative results of DIGIT.

Download pre-trained DIGIT:

wget https://dataset.ait.ethz.ch/downloads/dE6qPPePCV/db7cba8c1.pt
mv db7cba8c1.pt saved_models

Visualize results:

CUDA_VISIBLE_DEVICES=0 python demo.py --eval_on minival --load_from saved_models/db7cba8c1.pt  --num_workers 0

Evaluate pre-trained digit:

CUDA_VISIBLE_DEVICES=0 python test.py --eval_on test --load_from saved_models/db7cba8c1.pt --precision 16
CUDA_VISIBLE_DEVICES=0 python test.py --eval_on val --load_from saved_models/db7cba8c1.pt --precision 16

You should have the same results as in here.

The results will be dumped to ./visualization.

Citation

@inProceedings{fan2021digit,
  title={Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation},
  author={Fan, Zicong and Spurr, Adrian and Kocabas, Muhammed and Tang, Siyu and Black, Michael and Hilliges, Otmar},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2021}
}

License

Since our code is developed based on InterHand2.6M, which is CC-BY-NC 4.0 licensed, the same LICENSE is applied to DIGIT.

DIGIT is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

References

Some code in our repo uses snippets of the following repo:

Please consider citing them if you find our code useful:

@inproceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}  

@inproceedings{sun2019deep,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={CVPR},
  year={2019}
}

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}

@misc{Charles2013,
  author = {milesial},
  title = {Pytorch-UNet},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/milesial/Pytorch-UNet}}
}

Contact

For any question, you can contact [email protected].

Owner
Zicong Fan
A Ph.D. student at ETH Zurich.
Zicong Fan
DrNAS: Dirichlet Neural Architecture Search

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random va

Xiangning Chen 37 Jan 03, 2023
Starter Code for VALUE benchmark

StarterCode for VALUE Benchmark This is the starter code for VALUE Benchmark [website], [paper]. This repository currently supports all baseline model

VALUE Benchmark 73 Dec 09, 2022
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021)

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021) Kun Wang, Zhenyu Zhang, Zhiqiang Yan, X

kunwang 66 Nov 24, 2022
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
Meta Learning for Semi-Supervised Few-Shot Classification

few-shot-ssl-public Code for paper Meta-Learning for Semi-Supervised Few-Shot Classification. [arxiv] Dependencies cv2 numpy pandas python 2.7 / 3.5+

Mengye Ren 501 Jan 08, 2023
Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

Jason Kuen 296 Dec 27, 2022
This is the code used in the paper "Entity Embeddings of Categorical Variables".

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg

Cheng Guo 845 Nov 29, 2022
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Jian Zhang 84 Dec 09, 2022
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022
Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

NeRF++ Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields" Work with 360 capture of large-scale unbounded scenes. Sup

Kai Zhang 722 Dec 28, 2022
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Phil Wang 876 Dec 29, 2022
source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Source code for "Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge" Reference: Abhin Shah, Karthikeyan Shanmugam, Kartik Ahu

Abhin Shah 1 Jun 03, 2022
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the w

Yuval Nirkin 182 Dec 14, 2022
Advanced Signal Processing Notebooks and Tutorials

Advanced Digital Signal Processing Notebooks and Tutorials Prof. Dr. -Ing. Gerald Schuller Jupyter Notebooks and Videos: Renato Profeta Applied Media

Guitars.AI 115 Dec 13, 2022
Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

Peter Macgregor 5 May 06, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
Fantasy Points Prediction and Dream Team Formation

Fantasy-Points-Prediction-and-Dream-Team-Formation Collected Data from open source resources that have over 100 Parameters for predicting cricket play

Akarsh Singh 2 Sep 13, 2022
Image based Human Fall Detection

Here I integrated the YOLOv5 object detection algorithm with my own created dataset which consists of human activity images to achieve low cost, high accuracy, and real-time computing requirements

UTTEJ KUMAR 12 Dec 11, 2022