A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Overview

Minimal Hand

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

teaser

This project provides the core components for hand motion capture:

  1. estimating joint locations from a monocular RGB image (DetNet)
  2. estimating joint rotations from locations (IKNet)

We focus on:

  1. ease of use (all you need is a webcam)
  2. time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
  3. robustness to occlusion, hand-object interaction, fast motion, changing scale and view point

Some links: [video] [paper] [supp doc] [webpage]

The author is too busy to collect the training code for release. On the other hand, it should not be difficult to implement the training part. Feel free to open an issue for any encountered problems.

Pytorch Version

Here is a pytorch version implemented by @MengHao666. I didn't personally check it but I believe it worth trying. Many thanks to @MengHao666 !

With Unity

Here is a project that connects this repo to unity. It looks very cool and many thanks to @vinnik-dmitry07 !

Usage

Install dependencies

Please check requirements.txt. All dependencies are available via pip and conda.

Prepare MANO hand model

  1. Download MANO model from here and unzip it.
  2. In config.py, set OFFICIAL_MANO_PATH to the left hand model.
  3. Run python prepare_mano.py, you will get the converted MANO model that is compatible with this project at config.HAND_MESH_MODEL_PATH.

Prepare pre-trained network models

  1. Download models from here.
  2. Put detnet.ckpt.* in model/detnet, and iknet.ckpt.* in model/iknet.
  3. Check config.py, make sure all required files are there.

Run the demo for webcam input

  1. python app.py
  2. Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
  3. Press ESC to quit.
  4. Although the model is robust to variant scales, most ideally the image should be 1.3x larger than the hand bounding box. A good bounding box may result in better accuracy. You can track the bounding box with the 2D predictions of the model.

We found that the model may fail on some "simple" poses. We think this is because such poses were no presented in the training data. We are working on a v2 version with further extended data to tackle this problem.

Use the models in your project

Please check wrappers.py.

IKNet Alternative

We also provide an optimization-based IK solver here.

Dataset

The detection model is trained with following datasets:

The IK model is trained with the poses shipped with MANO.

Citation

This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).

The quantitative numbers reported in the paper can be found in plot.py.

If you find the project helpful, please consider citing us:

@inproceedings{zhou2020monocular,
  title={Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data},
  author={Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={0--0},
  year={2020}
}
Comments
  • About SMPL(mosh) label .

    About SMPL(mosh) label .

    Hello, Ask a question again. There is no mosh(SMPL theta and beta) in STB、RHD、FreiHand dataset etc. How to translate 3D keypoints to mesh(SMPL theta beta)? Hope your reply, thanks.

    opened by www516717402 13
  • how to use the Right-hand model

    how to use the Right-hand model

    In config.py, I set OFFICIAL_MANO_PATH to the right hand model and Run python prepare_mano.py.Then I can get the converted MANO model about right hand. But when I use the converted MANO model about right hand, the result is so bad. Where are the wrongs? Thus, I want to known how to use the MANO model about right hand OR how to convert the MANO model about right hand? looking forward to your reply! Thanks a lot.

    opened by huangfuts 12
  • Questions about training IKNet

    Questions about training IKNet

    Thank you for great project,I have a few questions about training IKNet

    1. When changing the original 16 rotations of MANO into 21 rotations, do W, T0, I0, M0, R0, and L0 share the rotation of W in the original MANO?
    2. I found the joints_xyz calculated by using MANO ref_pose and the transformed 21 rotation parameters using the method in hand_mesh.py is not equal to the 'J_transformed' saved in the MANO pkl file , the order of joints has been adjusted according to kinematics.py. When using the MANO dataset to train IKNet, how did you get the ground truth 3D joint annotation in Lxyz? Is the calculation method of FK (Q) the same as the calculation method of joint_xyz in hand_mesh.py
    opened by Gel-smile 9
  • How to mix and train the different datasets?

    How to mix and train the different datasets?

    Paper say that: DetNet is trained on 3 datasets: theCMU Panoptic Dataset (CMU) , the Rendered Hand-pose Dataset (RHD) and the GANerated Hands Dataset(GAN).

    Since the images of three datasets are different from each other, can u please tell me how to preprocess the image?

    opened by LyazS 8
  • How to get beta in IKNet?

    How to get beta in IKNet?

    You have done really a great work!

    When I read your paper about, I am a little confused about how to find the best beta in IKNet by minimizing E(beta). Is beta directly got by solving the function? Or using ML methods like Newton down-hill method?

    Thank you Best wishes

    opened by Mrsirovo 6
  • delta = delta * length为什么要进行相乘操作呢?

    delta = delta * length为什么要进行相乘操作呢?

    你好,在wrappers.py文件166行,使用delta = delta * length 作为ikmodel的输入之一 方便问一下这么做的原因吗? delta是关节归一化向量,乘以关节长度是为什么呢?

    我理解输入到后面ik模型的参数 需要有手部mesh模板参数,手部姿态参数sita(delta),skinning weights,手部坐标参数(xyz)

    所以就比较困惑只输入delta就可以了,为什么要相乘一下length。。。。

    opened by tonylin52 5
  • how to do

    how to do "global alignment"?

    Hi,I got confused about another problem.

    In your paper ,u said "As previous work, we perform a global alignment to better measure the local hand pose. " How do u implement the "global alignment"? Is it just to transalate the root joint to same location of label (Is the lable here is also root-relative and normalized using reference bone?) I got AUC of 0.1 only using DetNet retrained in RHD.

    Could u provide the "prevous work" that do a global alignment like u? And it would bebetter if their code has been public available. Thanks!

    opened by MengHao666 5
  • How can I use the model output quaternion to unity?

    How can I use the model output quaternion to unity?

    Thank you for your great work! I'm trying to use the model output to animate a virtual hand in unity, I tried to set the quaternion into unity's localrotation but it did not work. Could you share some insight about how I can achieve that?

    opened by wangtss 5
  • IK using 3D joint coordinates

    IK using 3D joint coordinates

    Hello, First, I would like to congrats you on the amazing paper. Moreover, I have a question regarding IK architecture. I would like to know if there is any comparison between the IK architecture that you proposed here and the other algorithm that you previously proposed based on Levenberg-Marquadr on Mano's hand. Additionally, could you guide me on applying the IK architecture without running the entire code as I have some ground truth 3d coordinates, and I want to obtain the IK parameters? Thanks a lot.

    opened by Amebradi 5
  • Obtaining MoCAP from a two hand video dataset

    Obtaining MoCAP from a two hand video dataset

    Greetings and many thanks for the great work.

    I wanted to utilize your code to extract MoCAP data given a first person RGB video dataset that has a clear view of both hands during a task. Given that your model is restricted to predicting from a single hand I wonder whether it will consistently show preference for the left hand if presented with videos that display both? If that's the case I suppose I could parse the dataset twice, flipping it the second time to obtain both hands' coordinates, right?

    opened by Linardos 5
  • Any plans on evaluating on FreiHAND dataset?

    Any plans on evaluating on FreiHAND dataset?

    I'm curious as it seems to be one of the better datasets publicly available, not only does it include really accurate 3D poses, but they are all on real images include challenging poses and object interactions. Along with all of this, it includes MANO hand shape ground truths. I would love to see how this model performs.

    It also allows for seeing how this performs without needing alignment since both camera intrinsics and scale are included for each image

    I'm also curious if this would be a good alternative for training IKNet instead of the MoCap data since it includes the hand shape ground truths. I'm not sure if I should open a separate issue for that to make it easier for others to find

    opened by pablovela5620 5
  • Keypoint representation as input to IKNet

    Keypoint representation as input to IKNet

    I am trying to use IKNet separately, starting from hand keypoints that have been extracted with MediaPipe. In order for this to work, I need to make sure that the Mediapipe hand coordinates are preprocessed in order to match the expected input format of IKNet (origin, scale, possibly rotation as well??).

    I ran into two questions here:

    1. I can see from your code that the keypoints have te be shifted to make 'M1' the origin. Bust what is the assumed scale? In the code you use IK_UNIT_LENGTH when rescaling from Mano reference keypoints, but it is not clear what this relates to or where it comes from. Also, is there an assumption on rotation of the hand (e.g. palm orientation)?

    2. I was assuming that the 'mpii_ref' keypoint set you pass as input to the IKNet would be some kind of "relaxed" reference hand (this is converted from the mano code base). When I plot it, however, the projection onto the xz plane matches this assumption, but the y coordinates look very strange, so I am assuming I am doing something wrong in interpreting this. Or maybe this incorporates some assumptions about the IKNet model input that I need to convert also to xyz keypoints input - since this seems to be passed as a reference hand? Could you clarify?

    Examples: (1) mpii_ref hand in front view (looking fine) mpii_ref_hand_xz (2) mpii_ref hand in rotated xyz view, showing unnaturally curved fingers and very long wrist-to-thumb connection mpii_ref_hand_xyz (3) For comparison: mediapipe hand in front view Mediapipe_hand_xz (4) For comparison: mediapipe hand in same xyz view as above Mediapipe_hand_xyz

    opened by jdambre 1
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In minimal-hand, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    pygame==1.9.4
    open3d==0.9
    tensorflow_gpu==1.14.0
    transforms3d==0.3.1
    keyboard==0.13.4
    opencv_python==3.4.3.18
    numpy==1.18.1
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency keyboard can be changed to >=0.9.3,<=0.13.5. The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the keyboard
    keyboard.is_pressed
    
    The calling methods from the numpy
    numpy.linalg.norm
    
    The calling methods from the all methods
    pygame.init
    open3d.visualization.Visualizer.update_renderer
    tensorflow.pad
    pickle.load
    open3d.visualization.Visualizer.update_geometry
    tensorflow.layers.dense
    zero_padding
    detnet
    open3d.geometry.TriangleMesh
    wrappers.ModelPipeline
    self.ik_model.process
    tf_hmap_to_uv
    pygame.display.set_mode.blit
    open3d.visualization.Visualizer.create_window
    tensorflow.nn.relu
    cv2.VideoCapture
    pickle.load.toarray
    load_pkl
    lmaps.append
    numpy.maximum
    tensorflow.ConfigProto
    dmaps.append
    tensorflow.norm
    tensorflow.reshape
    tensorflow.contrib.layers.xavier_initializer
    self.cap.read
    matplotlib.pyplot.show
    tensorflow.cast
    numpy.matmul
    dense
    open3d.geometry.TriangleMesh.compute_triangle_normals
    tensorflow.layers.batch_normalization
    numpy.sum
    viewer.get_view_control.set_constant_z_far
    capture.read
    transforms3d.quaternions.quat2mat
    pygame.time.Clock
    viewer.get_view_control.convert_to_pinhole_camera_parameters
    numpy.expand_dims
    tensorflow.contrib.layers.l2_regularizer
    tensorflow.concat.get_shape
    str
    utils.OneEuroFilter
    xyz_to_delta
    self.compute_alpha
    self.dx_filter.process
    open3d.geometry.TriangleMesh.compute_vertex_normals
    matplotlib.pyplot.plot
    pickle.dump
    conv_bn
    pygame.time.Clock.tick
    tensorflow.expand_dims
    features.get_shape.as_list
    tensorflow.nn.max_pool2d
    keyboard.is_pressed
    tensorflow.name_scope
    frame_large.np.flip.copy
    data.items
    numpy.abs
    net_2d
    open3d.visualization.Visualizer
    len
    utils.OneEuroFilter.process
    dense_bn
    matplotlib.pyplot.legend
    tensorflow.gather_nd
    tensorflow.argmax
    LowPassFilter
    tensorflow.train.Saver
    pygame.surfarray.make_surface
    tensorflow.stack
    numpy.linalg.norm
    MANOHandJoints.labels.index
    viewer.get_view_control.convert_from_pinhole_camera_parameters
    tensorflow.nn.sigmoid
    matplotlib.pyplot.xlabel
    plot_pck
    inputs.get_shape
    calculate_auc
    numpy.linspace.reshape
    pygame.display.update
    tensorflow.train.Saver.restore
    tensorflow.concat
    open3d.utility.Vector3dVector
    bottleneck
    open3d.visualization.Visualizer.poll_events
    self.det_model.process
    tensorflow.initializers.truncated_normal
    open3d.visualization.Visualizer.get_view_control
    numpy.transpose
    int
    xyz.get_shape.as_list
    hand_mesh.HandMesh.set_abs_quat
    numpy.tile
    cam_params.intrinsic.set_intrinsics
    self.ref_T.append
    open3d.utility.Vector3iVector
    numpy.array
    viewer.get_render_option.load_from_json
    self.graph.as_default
    open3d.visualization.Visualizer.get_render_option
    tensorflow.shape
    get_pose_tile
    mano_to_mpii
    xyz.get_shape
    tensorflow.tile
    ModelIK
    tensorflow.layers.conv2d
    numpy.stack
    tensorflow.transpose
    tensorflow.Session
    frame.np.flip.copy
    pygame.display.set_mode
    MANOHandJoints.mesh_mapping.items
    cv2.resize
    open3d.geometry.TriangleMesh.paint_uniform_color
    transforms3d.axangles.axangle2mat
    hmaps.append
    net_3d
    range
    pygame.display.set_caption
    hand_mesh.HandMesh
    MPIIHandJoints.labels.index
    self.verts.copy
    self.x_filter.process
    tensorflow.Graph
    ModelDet
    numpy.linspace
    wrappers.ModelPipeline.process
    matplotlib.pyplot.grid
    tensorflow.variable_scope
    numpy.concatenate
    tensorflow.constant
    tensorflow.maximum
    self.ref_pose.append
    conv_bn_relu
    capture.OpenCVCapture
    live_application
    matplotlib.pyplot.ylabel
    matplotlib.pyplot.tight_layout
    open3d.visualization.Visualizer.add_geometry
    kinematics.mpii_to_mano
    utils.imresize
    tensorflow.placeholder
    cam_params.extrinsic.copy
    numpy.stack.append
    self.sess.run
    resnet50
    open
    numpy.flip
    tensorflow.where
    prepare_mano
    numpy.finfo
    network_fn
    numpy.zeros
    inputs.get_shape.as_list
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • 关于生成手部的shape

    关于生成手部的shape

    您好,非常感谢您能分享您的成果。 我在阅读您的代码和尝试的时候有几个问题想请教下 1.在代码里没有评估beta参数,所以生成的手部并不能保持原来手的形状对吗,比如手指长短,比例,厚度 2.生成的手部模型的大小是固定的,不会因为图片中手的大小而变化,是这样吗 3.代码的输入是视频如果只输入一张图片是否会让结果变差呢

    opened by ChaoYingYu 0
Owner
Yuxiao Zhou
Good luck, have fun.
Yuxiao Zhou
Demos of essentia classifiers hosted on replicate.ai

essentia-replicate-demos Demos of Essentia models hosted on replicate.ai's MTG site. The models Check our site for a complete list of the models avail

Music Technology Group - Universitat Pompeu Fabra 12 Nov 14, 2022
Finite Element Analysis

FElupe - Finite Element Analysis FElupe is a Python 3.6+ finite element analysis package focussing on the formulation and numerical solution of nonlin

Andreas D. 20 Jan 09, 2023
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022
Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

cim-misspelling Pytorch implementation of Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence, CHIL 2022. This model (

Juyong Kim 11 Dec 19, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

Sampyl May 29, 2018: version 0.3 Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to

Mat Leonard 304 Dec 25, 2022
Collection of sports betting AI tools.

sports-betting sports-betting is a collection of tools that makes it easy to create machine learning models for sports betting and evaluate their perf

George Douzas 109 Dec 31, 2022
This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Prediction Graph Neural Network Model for Bike Sharing Systems".

cluster-link-prediction This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Predict

Bárbara 0 Dec 28, 2022
python debugger and anti-vm that checks if you're in a virtual machine or if someones trying to debug your file

Anti-Debug was made by Love ❌ code ✅ 🎉 ・What it checks for ・ Kills tools that can be used to debug your file ・ Exits if ran in vm (supports different

Rdimo 31 Aug 09, 2022
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022
Simulation code and tutorial for BBHnet training data

Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det

0 May 31, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 09, 2023
ADOP: Approximate Differentiable One-Pixel Point Rendering

ADOP: Approximate Differentiable One-Pixel Point Rendering Abstract: We present a novel point-based, differentiable neural rendering pipeline for scen

Darius Rückert 1.9k Jan 06, 2023
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

23 Nov 11, 2022
2020 CCF大数据与计算智能大赛-非结构化商业文本信息中隐私信息识别-第7名方案

2020CCF-NER 2020 CCF大数据与计算智能大赛-非结构化商业文本信息中隐私信息识别-第7名方案 bert base + flat + crf + fgm + swa + pu learning策略 + clue数据集 = test1单模0.906 词向量

67 Oct 19, 2022
Unofficial PyTorch implementation of SimCLR by Google Brain

Unofficial PyTorch implementation of SimCLR by Google Brain

Rishabh Anand 2 Oct 13, 2021
Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Get Fooled for the Right Reason Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness throu

Sowrya Gali 1 Apr 25, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022