UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Overview

UDP-Pose

This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Illustrating the performance of the proposed UDP

Top-Down

Results on MPII val dataset

Method--- Head Sho. Elb. Wri. Hip Kne. Ank. Mean Mean 0.1
HRNet32 97.1 95.9 90.3 86.5 89.1 87.1 83.3 90.3 37.7
+Dark 97.2 95.9 91.2 86.7 89.7 86.7 84.0 90.6 42.0
+UDP 97.4 96.0 91.0 86.5 89.1 86.6 83.3 90.4 42.1

Results on COCO val2017 with detector having human AP of 65.1 on COCO val2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR
pose_resnet_50 256x192 34.0M 8.90 71.3 89.9 78.9 68.3 77.4 76.9
+UDP 256x192 34.2M 8.96 72.9 90.0 80.2 69.7 79.3 78.2
pose_resnet_50 384x288 34.0M 20.0 73.2 90.7 79.9 69.4 80.1 78.2
+UDP 384x288 34.2M 20.1 74.0 90.3 80.0 70.2 81.0 79.0
pose_resnet_152 256x192 68.6M 15.7 72.9 90.6 80.8 69.9 79.0 78.3
+UDP 256x192 68.8M 15.8 74.3 90.9 81.6 71.2 80.6 79.6
pose_resnet_152 384x288 68.6M 35.6 75.3 91.0 82.3 71.9 82.0 80.4
+UDP 384x288 68.8M 35.7 76.2 90.8 83.0 72.8 82.9 81.2
pose_hrnet_w32 256x192 28.5M 7.10 75.6 91.9 83.0 72.2 81.6 80.5
+UDP 256x192 28.7M 7.16 76.8 91.9 83.7 73.1 83.3 81.6
+UDPv1 256x192 28.7M 7.16 77.2 91.6 84.2 73.7 83.7 82.5
+UDPv1+AID 256x192 28.7M 7.16 77.9 92.1 84.5 74.1 84.1 82.8
RSN18+UDP 256x192 - 2.5 74.7 - - - - -
pose_hrnet_w32 384x288 28.5M 16.0 76.7 91.9 83.6 73.2 83.2 81.6
+UDP 384x288 28.7M 16.1 77.8 91.7 84.5 74.2 84.3 82.4
pose_hrnet_w48 256x192 63.6M 14.6 75.9 91.9 83.5 72.6 82.1 80.9
+UDP 256x192 63.8M 14.7 77.2 91.8 83.7 73.8 83.7 82.0
pose_hrnet_w48 384x288 63.6M 32.9 77.1 91.8 83.8 73.5 83.5 81.8
+UDP 384x288 63.8M 33.0 77.8 92.0 84.3 74.2 84.5 82.5

Note:

  • Flip test is used.
  • Person detector has person AP of 65.1 on COCO val2017 dataset.
  • GFLOPs is for convolution and linear layers only.
  • UDPv1: v0:LOSS.KPD=4.0, v1:LOSS.KPD=3.5

Results on COCO test-dev with detector having human AP of 65.1 on COCO val2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR
pose_resnet_50 256x192 34.0M 8.90 70.2 90.9 78.3 67.1 75.9 75.8
+UDP 256x192 34.2M 8.96 71.7 91.1 79.6 68.6 77.5 77.2
pose_resnet_50 384x288 34.0M 20.0 71.3 91.0 78.5 67.3 77.9 76.6
+UDP 384x288 34.2M 20.1 72.5 91.1 79.7 68.8 79.1 77.9
pose_resnet_152 256x192 68.6M 15.7 71.9 91.4 80.1 68.9 77.4 77.5
+UDP 256x192 68.8M 15.8 72.9 91.6 80.9 70.0 78.5 78.4
pose_resnet_152 384x288 68.6M 35.6 73.8 91.7 81.2 70.3 80.0 79.1
+UDP 384x288 68.8M 35.7 74.7 91.8 82.1 71.5 80.8 80.0
pose_hrnet_w32 256x192 28.5M 7.10 73.5 92.2 82.0 70.4 79.0 79.0
+UDP 256x192 28.7M 7.16 75.2 92.4 82.9 72.0 80.8 80.4
pose_hrnet_w32 384x288 28.5M 16.0 74.9 92.5 82.8 71.3 80.9 80.1
+UDP 384x288 28.7M 16.1 76.1 92.5 83.5 72.8 82.0 81.3
pose_hrnet_w48 256x192 63.6M 14.6 74.3 92.4 82.6 71.2 79.6 79.7
+UDP 256x192 63.8M 14.7 75.7 92.4 83.3 72.5 81.4 80.9
pose_hrnet_w48 384x288 63.6M 32.9 75.5 92.5 83.3 71.9 81.5 80.5
+UDP 384x288 63.8M 33.0 76.5 92.7 84.0 73.0 82.4 81.6

Note:

  • Flip test is used.
  • Person detector has person AP of 65.1 on COCO val2017 dataset.
  • GFLOPs is for convolution and linear layers only.

Bottom-Up

HRNet

Arch P2I Input size Speed(task/s) AP Ap .5 AP .75 AP (M) AP (L) AR
HRNet(ori) T 512x512 - 64.4 - - 57.1 75.6 -
HRNet(mmpose) F 512x512 39.5 65.8 86.3 71.8 59.2 76.0 70.7
HRNet(mmpose) T 512x512 6.8 65.3 86.2 71.5 58.6 75.7 70.9
HRNet+UDP T 512x512 5.8 65.9 86.2 71.8 59.4 76.0 71.4
HRNet+UDP F 512x512 37.2 67.0 86.2 72.0 60.7 76.7 71.6
HRNet+UDP+AID F 512x512 37.2 68.4 88.1 74.9 62.7 77.1 73.0

HigherHRNet

Arch P2I Input size Speed(task/s) AP Ap .5 AP .75 AP (M) AP (L) AR
HigherHRNet(ori) T 512x512 - 67.1 - - 61.5 76.1 -
HigherHRNet T 512x512 9.4 67.2 86.1 72.9 61.8 76.1 72.2
HigherHRNet+UDP T 512x512 9.0 67.6 86.1 73.7 62.2 76.2 72.4
HigherHRNet F 512x512 24.1 67.1 86.1 73.6 61.7 75.9 72.0
HigherHRNet+UDP F 512x512 23.0 67.6 86.2 73.8 62.2 76.2 72.4
HigherHRNet+UDP+AID F 512x512 23.0 69.0 88.0 74.9 64.0 76.9 73.8

Note:

  • ori : Result from original HigherHrnet
  • mmpose : Pretrained models from mmpose
  • P2I : PROJECT2IMAGE
  • we use mmpose for codebase
  • the configurations of the baseline are HRNet-W32-512x512-batch16-lr0.001
  • Speed is tested with dist_test in mmpose codebase and 8 Gpus + 16 batchsize

Quick Start

(Recommend) For mmpose, please refer to MMPose

For hrnet, please refer to Hrnet

For RSN, please refer to RSN

Data preparation For coco, we provide the human detection result and pretrained model at BaiduDisk(dsa9)

Citation

If you use our code or models in your research, please cite with:

@inproceedings{cai2020learning,
  title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
  author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
  booktitle={ECCV},
  year={2020}
}
@article{huang2020joint,
  title={Joint coco and lvis workshop at eccv 2020: Coco keypoint challenge track technical report: Udp+},
  author={Huang, Junjie and Shan, Zengguang and Cai, Yuanhao and Guo, Feng and Ye, Yun and Chen, Xinze and Zhu, Zheng and Huang, Guan and Lu, Jiwen and Du, Dalong},
  year={2020}
}
Owner
Tsinghua University, Megvii Inc [email protected]
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

Secondmind Labs 107 Nov 02, 2022
AI-generated-characters for Learning and Wellbeing

AI-generated-characters for Learning and Wellbeing Click here for the full project page. This repository contains the source code for the paper AI-gen

MIT Media Lab 214 Jan 01, 2023
Tweesent-back - Tweesent backend uses fastAPI as the web framework

TweeSent Backend Tweesent backend. This repo uses fastAPI as the web framework.

0 Mar 26, 2022
Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".

Impression-Learning-Camera-Ready Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic

2 Feb 09, 2022
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

MIRA Lab 33 Dec 07, 2022
Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstrac

2 Apr 14, 2022
Final project for Intro to CS class.

Financial Analysis Web App https://share.streamlit.io/mayurk1/fin-web-app-final-project/webApp.py 1. Project Description This project is a technical a

Mayur Khanna 1 Dec 10, 2021
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 04, 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 08, 2023
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] Β© Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

Aaron Chen 2.4k Dec 28, 2022
Python library for tracking human heads with FLAME (a 3D morphable head model)

Video Head Tracker 3D tracking library for human heads based on FLAME (a 3D morphable head model). The tracking algorithm is inspired by face2face. It

61 Dec 25, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Creating Multi Task Models With Keras

Creating Multi Task Models With Keras About The Project! I used the keras and Tensorflow Library, To build a Deep Learning Neural Network to Creating

Srajan Chourasia 4 Nov 28, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 02, 2022
A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Lobe This is a Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images. This component lets you easily use an exported m

Kendell R 4 Feb 28, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr DollΓ‘r, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Hierarchical probabilistic 3D U-Net, with attention mechanisms (β€”π˜ˆπ˜΅π˜΅π˜¦π˜―π˜΅π˜ͺ𝘰𝘯 𝘜-π˜•π˜¦π˜΅, π˜šπ˜Œπ˜™π˜¦π˜΄π˜•π˜¦π˜΅) and a nested decoder structure with deep supervision (β€”π˜œπ˜•π˜¦π˜΅++).

Hierarchical probabilistic 3D U-Net, with attention mechanisms (β€”π˜ˆπ˜΅π˜΅π˜¦π˜―π˜΅π˜ͺ𝘰𝘯 𝘜-π˜•π˜¦π˜΅, π˜šπ˜Œπ˜™π˜¦π˜΄π˜•π˜¦π˜΅) and a nested decoder structure with deep supervision (β€”π˜œπ˜•π˜¦π˜΅++). Built in TensorFlow 2.5. Configured for vox

Diagnostic Image Analysis Group 32 Dec 08, 2022