Transfer Learning for Pose Estimation of Illustrated Characters

Last update: Dec 28, 2022

Related tags

Overview

bizarre-pose-estimator

Transfer Learning for Pose Estimation of Illustrated Characters
Shuhong Chen *, Matthias Zwicker *
WACV2022
[arxiv] [video] [poster] [github]

Human pose information is a critical component in many downstream image processing tasks, such as activity recognition and motion tracking. Likewise, a pose estimator for the illustrated character domain would provide a valuable prior for assistive content creation tasks, such as reference pose retrieval and automatic character animation. But while modern data-driven techniques have substantially improved pose estimation performance on natural images, little work has been done for illustrations. In our work, we bridge this domain gap by efficiently transfer-learning from both domain-specific and task-specific source models. Additionally, we upgrade and expand an existing illustrated pose estimation dataset, and introduce two new datasets for classification and segmentation subtasks. We then apply the resultant state-of-the-art character pose estimator to solve the novel task of pose-guided illustration retrieval. All data, models, and code will be made publicly available.

download

Downloads can be found in this drive folder: wacv2022_bizarre_pose_estimator_release

Download bizarre_pose_models.zip and extract to the root project directory; the extracted file structure should merge with the ones in this repo.
Download bizarre_pose_dataset.zip and extract to ./_data. The images and annotations should be at ./_data/bizarre_pose_dataset/raw.
Download character_bg_seg_data.zip and extract to ./_data. Under ./_data/character_bg_seg, there are bg and fg folders. All foregrounds come from danbooru, and are indexed by the provided csv. While some backgrounds come from danbooru, we use several from jerryli27/pixiv_dataset; these are somewhat hard to download, so we provide the raw pixiv images in the zip.
Please refer to Gwern's Danbooru dataset to download danbooru images by ID.

Warning: While NSFW art was filtered out from these data by tag, it was not possible to manually inspect all the data for mislabeled safety ratings. Please use this data at your own risk.

setup

Make a copy of ./_env/machine_config.bashrc.template to ./_env/machine_config.bashrc, and set $PROJECT_DN to the absolute path of this repository folder. The other variables are optional.

This project requires docker with a GPU. Run these lines from the project directory to pull the image and enter a container; note these are bash scripts inside the ./make folder, not make commands. Alternatively, you can build the docker image yourself.

make/docker_pull
make/shell_docker
# OR
make/docker_build
make/shell_docker

danbooru tagging

The danbooru subset used to train the tagger and custom tag rulebook can be found under ./_data/danbooru/_filters. Run this line to tag a sample image:

python3 -m _scripts.danbooru_tagger ./_samples/megumin.png

character background segmentation

Run this line to segment a sample image and extract the bounding box:

python3 -m _scripts.character_segmenter ./_samples/megumin.png

pose estimation

There are several models available in ./_train/character_pose_estim/runs, corresponding to our models at the top of Table 1 in the paper. Run this line to estimate the pose of a sample image, using one of those models:

python3 -m _scripts.pose_estimator \
    ./_samples/megumin.png \
    ./_train/character_pose_estim/runs/feat_concat+data.ckpt

pose-based retrieval

Run this line to estimate the pose of a sample image, and get links to danbooru posts with similar poses:

python3 -m _scripts.pose_retrieval ./_samples/megumin.png

faq

Does this work for multiple characters in an image, or images that aren't full-body? Sorry but no, this project is focused just on single full-body characters; however we may release our instance-based models separately.
Can I do this without docker? Please use docker, it is very good. If you can't use docker, you can try to replicate the environment from ./_env/Dockerfile, but this is untested.
What does bn mean in the files/code? It's sort for "basename", or an ID for a single data sample.
What is the sauce for the artwork in ./_samples? Full artist attributions are in the supplementary of our paper, Tables 2 and 3; the retrieval figure is the first two rows of Fig. 2, and Megumin is entry (1,0) of Fig. 3.
Which part is best? Part 4.

Transfer Learning for Pose Estimation of Illustrated Characters

Related tags

Overview

bizarre-pose-estimator

download

setup

danbooru tagging

character background segmentation

pose estimation

pose-based retrieval

faq

Owner

Shuhong Chen

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Establishing Strong Baselines for TripClick Health Retrieval; ECIR 2022

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021)

Simple implementation of OpenAI CLIP model in PyTorch.

Stochastic Extragradient: General Analysis and Improved Rates

A modular application for performing anomaly detection in networks

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

Highway networks implemented in PyTorch.

PyGCL: Graph Contrastive Learning Library for PyTorch

Compact Bidirectional Transformer for Image Captioning

A keras-based real-time model for medical image segmentation (CFPNet-M)

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

source code of “Visual Saliency Transformer” (ICCV2021)

An executor that loads ONNX models and embeds documents using the ONNX runtime.