4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Overview

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

[Project] [Paper]

Installation:

Our method requires the same dependencies as SMPLify-X and OpenPose. We refer to the official implementation fo SMPLify-X and OpenPose for installation details.

Our method also needs the installation of Chamfer Pytorch to calculate the chamfer distnace for enforceing human-scene constraints

Data Preparation:

Step 1: Dump video frames with desired fps (30) with utils/dump_videos.py. Run utils/split_frames to segment videos into equally long subatom clips. Repack frames to videos with utils/pack_videos.py (This is for faster openpose I/O).

Step 2: Run openpose_call.py under openpose folder to get human body keypoints, then run utils/openpose_helper to rename keypoint.json and run utils/openpose_filter.py to keep the most confident human keypoints.

Step 3: Run Smplify-X model with specified focal length and data directory. This step may take up to several hours. For instance:

python3 smplifyx/main.py --config cfg_files/fit_smplx.yaml  --data_folder /home/miao/data/rylm/downsampled_frames/miao_mainbuilding_0-1 --output_folder /home/miao/data/rylm/downsampled_frames/miao_mainbuilding_0-1/body_gen --visualize="False" --model_folder ./models --vposer_ckpt ./vposer --part_segm_fn smplx_parts_segm.pkl --focal_length 694.0

Step 4: Run Colmap for to generate scene mesh and camera trajectory. This step make take up to several hours depneding on the complexity of the scene. Then Run utils/camerpose_helper and utils/pointscloud_helper.py to generate desired points cloud file and camera pose.

Joint Optimization with 3D Scene Context:

Run global_optimization.py to conduct temproal smoothing and enforce human-scene constraints:

python3 global_optimization.py '/home/miao/data/rylm/packed_data/miao_mainbuidling_0-1/body_gen' '/home/miao/data/rylm/packed_data/miao_mainbuidling_0-1/smoothed_body

The resulting data should be organized as following:

  • datafolder:
    • videoname:
      • images: folder that contains all video frames
      • keypoints: folder that contains all body keypoints
      • body_gen: folder that contains all body mesh files:
      • smoothed_boyd: folder that contains all jointly-optimized body mesh files:
      • camera_pose.txt: text file that contains camera pose at each temporal footprint
      • meshed-poisson.ply: scene mesh file from dense reconstruction
      • camera.txt: text file that contains camera parameters
      • xyz.ply point cloud file. (use meash lab to convert .xyz file to .ply file)

Visualization in the World Coordinate:

Run global_vis.py to transform the body mesh in pivot coordinate to world coordinate. By default the viewpoint of open3d is the initial position camera trajectory. Setting bool flag to 'True' will resulting into a open3d viewpoint moving the same way as camera viewer.

python3 global_vis.py '/home/miao/data/rylm/downsampled_frames/miao_mainbuilding_0-1/' False

Visualization in the Egocentric Coordinate:

Run vis.py to view recosntrcuted body mesh on image plane.

python3 vis.py '/home/miao/data/rylm/segmented_data/miao_mainbuilding_0-1/'

Citation

If you find our code useful in your research, please use the following BibTeX entry for citation.

@inproceedings{liu20204d,
  title={4D Human Body Capture from Egocentric Video via 3D Scene Grounding},
  author={Liu, Miao and Yang, Dexin and Zhang, Yan and Cui, Zhaopeng and Rehg, James M and Tang, Siyu},
  booktitle={3DV},
  year={2021}
}
Owner
Miao Liu
Miao Liu
SGoLAM - Simultaneous Goal Localization and Mapping

SGoLAM - Simultaneous Goal Localization and Mapping PyTorch implementation of the MultiON runner-up entry, SGoLAM: Simultaneous Goal Localization and

10 Jan 05, 2023
A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

R-YOLOv4 This is a PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detect

94 Dec 03, 2022
a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

Nidhal Baccouri 3k Jan 05, 2023
pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

pcnaDeep: a deep-learning based single-cell cycle profiler with PCNA signal Welcome! pcnaDeep integrates cutting-edge detection techniques with tracki

ChanLab 8 Oct 18, 2022
Automatic Idiomatic Expression Detection

IDentifier of Idiomatic Expressions via Semantic Compatibility (DISC) An Idiomatic identifier that detects the presence and span of idiomatic expressi

5 Jun 09, 2022
Predict stock movement with Machine Learning and Deep Learning algorithms

Project Overview Stock market movement prediction using LSTM Deep Neural Networks and machine learning algorithms Software and Library Requirements Th

Naz Delam 46 Sep 13, 2022
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 08, 2022
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 09, 2022
Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

118 Dec 12, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 04, 2023
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

A Light in the Dark: Deep Learning Practices for Industrial Computer Vision This is the repository for our Paper/Contribution to the WI2022 in Nürnber

Maximilian Harl 6 Jan 17, 2022
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

28 Dec 02, 2022
The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

Jinkun Cao 325 Jan 05, 2023
Projects of Andfun Yangon

AndFunYangon Projects of Andfun Yangon First Commit We can use gsearch.py to sea

Htin Aung Lu 1 Dec 28, 2021
Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

16 Nov 28, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 09, 2022
Domain Generalization with MixStyle, ICLR'21.

MixStyle This repo contains the code of our ICLR'21 paper, "Domain Generalization with MixStyle". The OpenReview link is https://openreview.net/forum?

Kaiyang 208 Dec 28, 2022
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022
A small library for doing fluid simulation with neural networks.

Neural Fluid Fields This is a small library for doing fluid simulation with neural fields. Check out our review paper, Neural Fields in Visual Computi

Towaki 23 Jun 23, 2022