Fashion Landmark Estimation with HRNet

Overview

HRNet for Fashion Landmark Estimation

(Modified from deep-high-resolution-net.pytorch)

Introduction

This code applies the HRNet (Deep High-Resolution Representation Learning for Human Pose Estimation) onto fashion landmark estimation task using the DeepFashion2 dataset. HRNet maintains high-resolution representations throughout the forward path. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise.

Illustrating the architecture of the proposed HRNet

Please note that every image in DeepFashion2 contains multiple fashion items, while our model assumes that there exists only one item in each image. Therefore, what we feed into the HRNet is not the original image but the cropped ones provided by a detector. In experiments, one can either use the ground truth bounding box annotation to generate the input data or use the output of a detecter.

Main Results

Landmark Estimation Performance on DeepFashion2 Test set

We won the third place in the "DeepFashion2 Challenge 2020 - Track 1 Clothes Landmark Estimation" competition. DeepFashion2 Challenge 2020 - Track 1 Clothes Landmark Estimation

Landmark Estimation Performance on DeepFashion2 Validation Set

Arch BBox Source AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet Detector 0.579 0.793 0.658 0.460 0.581 0.706 0.939 0.784 0.548 0.708
pose_hrnet GT 0.702 0.956 0.801 0.579 0.703 0.740 0.965 0.827 0.592 0.741

Quick start

Installation

  1. Install pytorch >= v1.2 following official instruction. Note that if you use pytorch's version < v1.0.0, you should follow the instruction at https://github.com/Microsoft/human-pose-estimation.pytorch to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)

  2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  5. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    |-- lib
    |-- tools 
    |-- experiments
    |-- models
    |-- data
    |-- log
    |-- output
    |-- README.md
    `-- requirements.txt
    
  6. Download pretrained models from our Onedrive Cloud Storage

Data preparation

Our experiments were conducted on DeepFashion2, clone this repo, and we'll call the directory that you cloned as ${DF2_ROOT}.

1) Download the dataset

Extract the dataset under ${POSE_ROOT}/data.

2) Convert annotations into coco-type

The above code repo provides a script to convert annotations into coco-type.

We uploaded our converted annotation file onto OneDrive named as train/val-coco_style.json. We also made truncated json files such as train-coco_style-32.json meaning the first 32 samples in the dataset to save the loading time during development period.

3) Install the deepfashion_api

Enter ${DF2_ROOT}/deepfashion2_api/PythonAPI and run

python setup.py install

Note that the deepfashion2_api is modified from the cocoapi without changing the package name. Therefore, conflicts occur if you try to install this package when you have installed the original cocoapi in your computer. We provide two feasible solutions: 1) run our code in a virtualenv 2) use the deepfashion2_api as a local pacakge. Also note that deepfashion2_api is different with cocoapi mainly in the number of classes and the values of standard variations for keypoints.

At last the directory should look like this:

${POSE_ROOT}
|-- data
`-- |-- deepfashion2
    `-- |-- train
        |   |-- image
        |   |-- annos                           (raw annotation)
        |   |-- train-coco_style.json           (converted annotation file)
        |   `-- train-coco_style-32.json      (truncated for fast debugging)
        |-- validation
        |   |-- image
        |   |-- annos                           (raw annotation)
        |   |-- val-coco_style.json             (converted annotation file)
        |   `-- val-coco_style-64.json        (truncated for fast debugging)
        `-- json_for_test
            `-- keypoints_test_information.json

Training and Testing

Note that the GPUS parameter in the yaml config file is deprecated. To select GPUs, use the environment varaible:

 export CUDA_VISIBLE_DEVICES=1

Testing on DeepFashion2 dataset with BBox from ground truth using trained models:

python tools/test.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth \
    TEST.USE_GT_BBOX True

Testing on DeepFashion2 dataset with BBox from a detector using trained models:

python tools/test.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth \
    TEST.DEEPFASHION2_BBOX_FILE data/bbox_result_val.pkl \

Training on DeepFashion2 dataset using pretrained models:

python tools/train.py \
    --cfg experiments/deepfashion2/hrnet/w48_384x288_adam_lr1e-3.yaml \
     MODEL.PRETRAINED models/pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth

Other options

python tools/test.py \
    ... \
    DATASET.MINI_DATASET True \ # use a subset of the annotation to save loading time
    TAG 'experiment description' \ # this info will appear in the output directory name
    WORKERS 4 \ # num_of_worker for the dataloader
    TEST.BATCH_SIZE_PER_GPU 8 \
    TRAIN.BATCH_SIZE_PER_GPU 8 \

OneDrive Cloud Storage

OneDrive

We provide the following files:

  • Model checkpoint files
  • Converted annotation files in coco-type
  • Bounding box results from our self-implemented detector in a pickle file.
hrnet-for-fashion-landmark-estimation.pytorch
|-- models
|   `-- pose_hrnet-w48_384x288-deepfashion2_mAP_0.7017.pth
|
|-- data
|   |-- bbox_result_val.pkl
|   |
`-- |-- deepfashion2
    `---|-- train
        |   |-- train-coco_style.json           (converted annotation file)
        |   `-- train-coco_style-32.json      (truncated for fast debugging)
        `-- validation
            |-- val-coco_style.json             (converted annotation file)
            `-- val-coco_style-64.json        (truncated for fast debugging)
        

Discussion

Experiment Configuration

  • For the regression target of keypoint heatmaps, we tuned the standard deviation value sigma and finally set it to 2.
  • During training, we found that the data augmentation from the original code was too intensive which makes the training process unstable. We weakened the augmentation parameters and observed performance gain.
  • Due to the imbalance of classes in DeepFashion2 dataset, the model's performance on different classes varies a lot. Therefore, we adopted a weighted sampling strategy rather than the naive random shuffling strategy, and observed performance gain.
  • We expermented with the value of weight decay, and found that either 1e-4 or 1e-5 harms the performance. Therefore, we simply set weight decay to 0.
Owner
SVIP Lab
ShanghaiTech Vision and Intelligent Perception Lab
SVIP Lab
LEAP: Learning Articulated Occupancy of People

LEAP: Learning Articulated Occupancy of People Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission LEAP: Lear

Neural Bodies 60 Nov 18, 2022
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Cross-Speaker-Emotion-Transfer - PyTorch Implementation PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Conditio

Keon Lee 114 Jan 08, 2023
Repository sharing code and the model for the paper "Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes"

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes Setup virtualenv -p python3 venv source venv/bin/activate pip instal

Planet AI GmbH 9 May 20, 2022
Functional deep learning

Pipeline abstractions for deep learning. Full documentation here: https://lf1-io.github.io/padl/ PADL: is a pipeline builder for PyTorch. may be used

LF1 101 Nov 09, 2022
Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023
Churn prediction

Churn-prediction Churn-prediction Data preprocessing:: Label encoder is used to normalize the categorical variable Data Transformation:: For each data

1 Sep 28, 2022
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store dev

George Rocha 0 Feb 03, 2022
Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation"

CoCosNet Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation" (CVPR 2020 oral). Update: 202

Lingbo Yang 38 Sep 22, 2021
A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Hands-on-Machine-Learning 目的 这份笔记旨在帮助中文学习者以一种较快较系统的方式入门机器学习, 是在学习Hands-on Machine Learning with Scikit-Learn and TensorFlow这本书的 时候做的个人笔记: 此项目的可取之处 原书的

Baymax 1.5k Dec 21, 2022
An easy-to-use app to visualise attentions of various VQA models.

Ask Me Anything: A tool for visualising Visual Question Answering (AMA) An easy-to-use app to visualise attentions of various VQA models. Please click

Apoorve 37 Nov 13, 2022
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models (Paper) (Slides) (Video) RuleBERT is a pre-trained language model that has been fine-tune

16 Aug 24, 2022
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
Captcha-tensorflow - Image Captcha Solving Using TensorFlow and CNN Model. Accuracy 90%+

Captcha Solving Using TensorFlow Introduction Solve captcha using TensorFlow. Learn CNN and TensorFlow by a practical project. Follow the steps, run t

Jackon Yang 869 Jan 06, 2023
Official Implementation of Domain-Aware Universal Style Transfer

Domain Aware Universal Style Transfer Official Pytorch Implementation of 'Domain Aware Universal Style Transfer' (ICCV 2021) Domain Aware Universal St

KibeomHong 80 Dec 30, 2022
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
3 Apr 20, 2022
Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

Official Discussion Group (Telegram): https://t.me/video2x A Discord server is also available. Please note that most developers are only on Telegram.

K4YT3X 5.9k Dec 31, 2022
This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

SILG This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark. If you find this work helpful, please cons

Victor Zhong 17 Nov 27, 2022
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022