UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Last update: Jul 29, 2022

Related tags

Overview

UDP-Pose

This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop.

Top-Down

Results on MPII val dataset

Method---	Head	Sho.	Elb.	Wri.	Hip	Kne.	Ank.	Mean	Mean 0.1
HRNet32	97.1	95.9	90.3	86.5	89.1	87.1	83.3	90.3	37.7
+Dark	97.2	95.9	91.2	86.7	89.7	86.7	84.0	90.6	42.0
+UDP	97.4	96.0	91.0	86.5	89.1	86.6	83.3	90.4	42.1

Results on COCO val2017 with detector having human AP of 65.1 on COCO val2017 dataset

Arch	Input size	#Params	GFLOPs	AP	Ap .5	AP .75	AP (M)	AP (L)	AR
pose_resnet_50	256x192	34.0M	8.90	71.3	89.9	78.9	68.3	77.4	76.9
+UDP	256x192	34.2M	8.96	72.9	90.0	80.2	69.7	79.3	78.2
pose_resnet_50	384x288	34.0M	20.0	73.2	90.7	79.9	69.4	80.1	78.2
+UDP	384x288	34.2M	20.1	74.0	90.3	80.0	70.2	81.0	79.0
pose_resnet_152	256x192	68.6M	15.7	72.9	90.6	80.8	69.9	79.0	78.3
+UDP	256x192	68.8M	15.8	74.3	90.9	81.6	71.2	80.6	79.6
pose_resnet_152	384x288	68.6M	35.6	75.3	91.0	82.3	71.9	82.0	80.4
+UDP	384x288	68.8M	35.7	76.2	90.8	83.0	72.8	82.9	81.2
pose_hrnet_w32	256x192	28.5M	7.10	75.6	91.9	83.0	72.2	81.6	80.5
+UDP	256x192	28.7M	7.16	76.8	91.9	83.7	73.1	83.3	81.6
+UDPv1	256x192	28.7M	7.16	77.2	91.6	84.2	73.7	83.7	82.5
+UDPv1+AID	256x192	28.7M	7.16	77.9	92.1	84.5	74.1	84.1	82.8
RSN18+UDP	256x192	-	2.5	74.7	-	-	-	-	-
pose_hrnet_w32	384x288	28.5M	16.0	76.7	91.9	83.6	73.2	83.2	81.6
+UDP	384x288	28.7M	16.1	77.8	91.7	84.5	74.2	84.3	82.4
pose_hrnet_w48	256x192	63.6M	14.6	75.9	91.9	83.5	72.6	82.1	80.9
+UDP	256x192	63.8M	14.7	77.2	91.8	83.7	73.8	83.7	82.0
pose_hrnet_w48	384x288	63.6M	32.9	77.1	91.8	83.8	73.5	83.5	81.8
+UDP	384x288	63.8M	33.0	77.8	92.0	84.3	74.2	84.5	82.5

Note:

Flip test is used.
Person detector has person AP of 65.1 on COCO val2017 dataset.
GFLOPs is for convolution and linear layers only.
UDPv1: v0:LOSS.KPD=4.0, v1:LOSS.KPD=3.5

Results on COCO test-dev with detector having human AP of 65.1 on COCO val2017 dataset

Arch	Input size	#Params	GFLOPs	AP	Ap .5	AP .75	AP (M)	AP (L)	AR
pose_resnet_50	256x192	34.0M	8.90	70.2	90.9	78.3	67.1	75.9	75.8
+UDP	256x192	34.2M	8.96	71.7	91.1	79.6	68.6	77.5	77.2
pose_resnet_50	384x288	34.0M	20.0	71.3	91.0	78.5	67.3	77.9	76.6
+UDP	384x288	34.2M	20.1	72.5	91.1	79.7	68.8	79.1	77.9
pose_resnet_152	256x192	68.6M	15.7	71.9	91.4	80.1	68.9	77.4	77.5
+UDP	256x192	68.8M	15.8	72.9	91.6	80.9	70.0	78.5	78.4
pose_resnet_152	384x288	68.6M	35.6	73.8	91.7	81.2	70.3	80.0	79.1
+UDP	384x288	68.8M	35.7	74.7	91.8	82.1	71.5	80.8	80.0
pose_hrnet_w32	256x192	28.5M	7.10	73.5	92.2	82.0	70.4	79.0	79.0
+UDP	256x192	28.7M	7.16	75.2	92.4	82.9	72.0	80.8	80.4
pose_hrnet_w32	384x288	28.5M	16.0	74.9	92.5	82.8	71.3	80.9	80.1
+UDP	384x288	28.7M	16.1	76.1	92.5	83.5	72.8	82.0	81.3
pose_hrnet_w48	256x192	63.6M	14.6	74.3	92.4	82.6	71.2	79.6	79.7
+UDP	256x192	63.8M	14.7	75.7	92.4	83.3	72.5	81.4	80.9
pose_hrnet_w48	384x288	63.6M	32.9	75.5	92.5	83.3	71.9	81.5	80.5
+UDP	384x288	63.8M	33.0	76.5	92.7	84.0	73.0	82.4	81.6

Note:

Flip test is used.
Person detector has person AP of 65.1 on COCO val2017 dataset.
GFLOPs is for convolution and linear layers only.

Bottom-Up

HRNet

Arch	P2I	Input size	Speed(task/s)	AP	Ap .5	AP .75	AP (M)	AP (L)	AR
HRNet(ori)	T	512x512	-	64.4	-	-	57.1	75.6	-
HRNet(mmpose)	F	512x512	39.5	65.8	86.3	71.8	59.2	76.0	70.7
HRNet(mmpose)	T	512x512	6.8	65.3	86.2	71.5	58.6	75.7	70.9
HRNet+UDP	T	512x512	5.8	65.9	86.2	71.8	59.4	76.0	71.4
HRNet+UDP	F	512x512	37.2	67.0	86.2	72.0	60.7	76.7	71.6
HRNet+UDP+AID	F	512x512	37.2	68.4	88.1	74.9	62.7	77.1	73.0

HigherHRNet

Arch	P2I	Input size	Speed(task/s)	AP	Ap .5	AP .75	AP (M)	AP (L)	AR
HigherHRNet(ori)	T	512x512	-	67.1	-	-	61.5	76.1	-
HigherHRNet	T	512x512	9.4	67.2	86.1	72.9	61.8	76.1	72.2
HigherHRNet+UDP	T	512x512	9.0	67.6	86.1	73.7	62.2	76.2	72.4
HigherHRNet	F	512x512	24.1	67.1	86.1	73.6	61.7	75.9	72.0
HigherHRNet+UDP	F	512x512	23.0	67.6	86.2	73.8	62.2	76.2	72.4
HigherHRNet+UDP+AID	F	512x512	23.0	69.0	88.0	74.9	64.0	76.9	73.8

Note:

ori : Result from original HigherHrnet
mmpose : Pretrained models from mmpose
P2I : PROJECT2IMAGE
we use mmpose for codebase
the configurations of the baseline are HRNet-W32-512x512-batch16-lr0.001
Speed is tested with dist_test in mmpose codebase and 8 Gpus + 16 batchsize

Quick Start

(Recommend) For mmpose, please refer to MMPose

For hrnet, please refer to Hrnet

For RSN, please refer to RSN

Data preparation For coco, we provide the human detection result and pretrained model at BaiduDisk(dsa9)

Citation

If you use our code or models in your research, please cite with:

@inproceedings{cai2020learning,
  title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
  author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
  booktitle={ECCV},
  year={2020}
}
@article{huang2020joint,
  title={Joint coco and lvis workshop at eccv 2020: Coco keypoint challenge track technical report: Udp+},
  author={Huang, Junjie and Shan, Zengguang and Cai, Yuanhao and Guo, Feng and Ye, Yun and Chen, Xinze and Zhu, Zheng and Huang, Guan and Lu, Jiwen and Du, Dalong},
  year={2020}
}

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Related tags

Overview

UDP-Pose

Top-Down

Results on MPII val dataset

Results on COCO val2017 with detector having human AP of 65.1 on COCO val2017 dataset

Note:

Results on COCO test-dev with detector having human AP of 65.1 on COCO val2017 dataset

Note:

Bottom-Up

HRNet

HigherHRNet

Note:

Quick Start

Citation

Owner

Human Detection - Pedestrian Detection using OpenCV Python

Deep GPs built on top of TensorFlow/Keras and GPflow

AI-generated-characters for Learning and Wellbeing

Tweesent-back - Tweesent backend uses fastAPI as the web framework

Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

Improving Object Detection by Estimating Bounding Box Quality Accurately

Final project for Intro to CS class.

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

ConvMAE: Masked Convolution Meets Masked Autoencoders

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Python library for tracking human heads with FLAME (a 3D morphable head model)

CLIP (Contrastive Language–Image Pre-training) for Italian

Creating Multi Task Models With Keras

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Boundary IoU API (Beta version)

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).