Basics of 2D and 3D Human Pose Estimation.

Overview

Human Pose Estimation 101

If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolved, check out these articles I published on 2D Pose Estimation and 3D Pose Estimation

Table of Contents

Basics

  • Defined as the problem of localization of human joints (or) keypoints
  • A rigid body consists of joints and rigid parts. A body with strong articulation is a body with strong contortion.
  • Pose Estimation is the search for a specific pose in space of all articulated poses
  • Number of keypoints varies with dataset - LSP has 14, MPII has 16, 16 are used in Human3.6m
  • Classifed into 2D and 3D Pose Estimation
    • 2D Pose Estimation
    • Estimate a 2D pose (x,y) coordinates for each joint in pixel space from a RGB image
    • 3D Pose Estimation
    • Estimate a 3D pose (x,y,z) coordinates in metric space from a RGB image, or in previous works, data from a RGB-D sensor. (However, research in the past few years is heavily focussed on generating 3D poses from 2D images / 2D videos)

Loss

  • Most commonly used loss function - Mean Squared Error, MSE(Least Squares Loss)
  • This is a regression problem. The model will try to regress to the the correct coordinates, i.e move to the ground truth coordinatate’s in small increments. The model is trained to output continuous coordinates using a Mean Squared Error loss function

Evaluation metrics

Percentage of Correct Parts - PCP

  • A limb is considered detected and a correct part if the distance between the two predicted joint locations and the true limb joint locations is at most half of the limb length (PCP at 0.5 )
  • Measures detection rate of limbs
  • Cons - penalizes shorter limbs
  • Calculation
    • For a specific part, PCP = (No. of correct parts for entire dataset) / (No. of total parts for entire dataset)
    • Take a dataset with 10 images and 1 pose per image. Each pose has 8 parts - ( upper arm, lower arm, upper leg, lower leg ) x2
    • No of upper arms = 10 * 2 = 20
    • No of lower arms = 20
    • No of lower legs = No of upper legs = 20
    • If upper arm is detected correct for 17 out of the 20 upper arms i.e 17 ( 10 right arms and 7 left) → PCP = 17/20 = 85%
  • Higher the better

Percentage of Correct Key-points - PCK

  • Detected joint is considered correct if the distance between the predicted and the true joint is within a certain threshold (threshold varies)
  • [email protected] is when the threshold = 50% of the head bone link
  • [email protected] == Distance between predicted and true joint < 0.2 * torso diameter
  • Sometimes 150 mm is taken as the threshold
  • Head, shoulder, Elbow, Wrist, Hip, Knee, Ankle → Keypoints
  • PCK is used for 2D and 3D (PCK3D)
  • Higher the better

Percentage of Detected Joints - PDJ

  • Detected joint is considered correct if the distance between the predicted and the true joint is within a certain fraction of the torso diameter
  • Alleviates the shorter limb problem since shorter limbs have smaller torsos
  • PDJ at 0.2 → Distance between predicted and true join < 0.2 * torso diameter
  • Typically used for 2D Pose Estimation
  • Higher the better

Mean Per Joint Position Error - MPJPE

  • Per joint position error = Euclidean distance between ground truth and prediction for a joint
  • Mean per joint position error = Mean of per joint position error for all k joints (Typically, k = 16)
  • Calculated after aligning the root joints (typically the pelvis) of the estimated and groundtruth 3D pose.
  • PA MPJPE
    • Procrustes analysis MPJPE.
    • MPJPE calculated after the estimated 3D pose is aligned to the groundtruth by the Procrustes method
    • Procrustes method is simply a similarity transformation
  • Lower the better
  • Used for 3D Pose Estimation

AUC

Important Applications

  • Activity Analysis
  • Human-Computer Interaction (HCI)
  • Virtual Reality
  • Augmented Reality
  • Amazon Go presents an important domain for the application of Human Pose Estimation. Cameras track and recognize people and their actions, for which Pose Estimation is an important component. Entities relying on services that track and measure human activities rely heavily on human Pose Estimation

Informative roadmap on 2D Human Pose Estimation research

Owner
Sudharshan Chandra Babu
Machine Learning Engineer
Sudharshan Chandra Babu
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation This project attempted to implement the paper Putting NeRF on a

254 Dec 27, 2022
[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Listening to Sounds of Silence for Speech Denoising Introduction This is the repository of the "Listening to Sounds of Silence for Speech Denoising" p

Henry Xu 40 Dec 20, 2022
Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

Dual Self-Attention Network for Multivariate Time Series Forecasting 20.10.26 Update: Due to the difficulty of installation and code maintenance cause

Kyon Huang 223 Dec 16, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Gradient Inversion with Generative Image Prior

Gradient Inversion with Generative Image Prior This repository is an implementation of "Gradient Inversion with Generative Image Prior", accepted to N

MLLab @ Postech 25 Jan 09, 2023
A paper using optimal transport to solve the graph matching problem.

GOAT A paper using optimal transport to solve the graph matching problem. https://arxiv.org/abs/2111.05366 Repo structure .github: Files specifying ho

neurodata 8 Jan 04, 2023
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 09, 2022
Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

Chibueze Henry 6 Mar 16, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
Emotion Recognition from Facial Images

Reconhecimento de Emoções a partir de imagens faciais Este projeto implementa um classificador simples que utiliza técncias de deep learning e transfe

Gabriel 2 Feb 09, 2022
Pytorch implementation of Nueral Style transfer

Nueral Style Transfer Pytorch implementation of Nueral style transfer algorithm , it is used to apply artistic styles to content images . Content is t

Abhinav 9 Oct 15, 2022
Python3 Implementation of (Subspace Constrained) Mean Shift Algorithm in Euclidean and Directional Product Spaces

(Subspace Constrained) Mean Shift Algorithms in Euclidean and/or Directional Product Spaces This repository contains Python3 code for the mean shift a

Yikun Zhang 0 Oct 19, 2021
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

crfill Usage | Web App | | Paper | Supplementary Material | More results | code for paper ``CR-Fill: Generative Image Inpainting with Auxiliary Contex

182 Dec 20, 2022
《Fst Lerning of Temporl Action Proposl vi Dense Boundry Genertor》(AAAI 2020)

Update 2020.03.13: Release tensorflow-version and pytorch-version DBG complete code. 2019.11.12: Release tensorflow-version DBG inference code. 2019.1

Tencent 338 Dec 16, 2022
PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

PEPit: Performance Estimation in Python This open source Python library provides a generic way to use PEP framework in Python. Performance estimation

Baptiste 53 Nov 16, 2022
Implementation of Neural Style Transfer in Pytorch

PytorchNeuralStyleTransfer Code to run Neural Style Transfer from our paper Image Style Transfer Using Convolutional Neural Networks. Also includes co

Leon Gatys 396 Dec 01, 2022
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022
Python library for tracking human heads with FLAME (a 3D morphable head model)

Video Head Tracker 3D tracking library for human heads based on FLAME (a 3D morphable head model). The tracking algorithm is inspired by face2face. It

61 Dec 25, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 102 Jan 05, 2023