Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Last update: Jul 02, 2022

Related tags

Deep Learning RT-VIBE

Overview

Real-time VIBE

Inference VIBE frame-by-frame.

Overview

This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE].

Usage:

import cv2
from vibe.rt.rt_vibe import RtVibe

rt_vibe = RtVibe()
cap = cv2.VideoCapture('sample_video.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    rt_vibe(frame)  # This will open a cv2 window

SMPL Render takes most of the time, which can be closed with vibe_live.render = False

Getting Started

Installation:

# conda must be installed first
wget https://github.com/zc402/RT-VIBE/releases/download/v1.0.0/RT-VIBE.tar.gz
tar zxf RT-VIBE.tar.gz
cd RT-VIBE
# This will create a new conda env called vibe_env
source scripts/install_conda.sh
pip install .  # Install rt-vibe

Run on sample video:

python rt_demo.py  # (This runs sample_video.mp4)
# or
python rt_demo.py --vid_file=multiperson.mp4

Run on camera:

python rt_demo.py --camera

Try with google colab

This notebook provides video and camera inference example.

(there are some dependency errors during pip install, which is safe to ignore. Remember to restart environment after installing pytorch.)

https://colab.research.google.com/drive/1VKXGTfwIYT-ltbbEjhCpEczGpksb8I7o?usp=sharing

Features

Make VIBE an installable package
Fix GRU hidden states lost between batches in demo.py
Add realtime interface which processes the video stream frame-by-frame
Decrease GPU memory usage

Explain

Pip installable.

This repo renames "lib" to "vibe" ("lib" is not a feasible package name), corrects corresponding imports, adds __init__.py files. It can be installed with:

pip install git+https://github.com/zc402/RT-VIBE

GRU hidden state lost:

The original vibe.py reset GRU memory for each batch, which causes discontinuous predictions.
The GRU hidden state is reset at:

# .../models/vibe.py
# class TemporalEncoder
# def forward()
y, _ = self.gru(x)

# The "_" is the final hidden state and should be preserved
# https://pytorch.org/docs/stable/generated/torch.nn.GRU.html

This repo preserve GRU hidden state within the lifecycle of the model, instead of one batch.

# Fix:

# __init__()
self.gru_final_hidden = None

# forward()
y, self.gru_final_hidden = self.gru(x, self.gru_final_hidden)

Real-time interface

This feature makes VIBE run on webcam.
Processing steps of the original VIBE :
- use ffmpeg to split video into images, save to /tmp
- process the human tracking for whole video, keep results in memory
- predict smpl params with VIBE for whole video, 1 person at a time.
- (optional) render and show (frame by frame)
- save rendered result
Processing steps of realtime interface
- create VIBE model.
- read a frame with cv2
- run tracking for 1 frame
- predict smpl params for each person, keep the hidden states separately.
- (optional) render and show
Changes
- Multi-person-tracker is modified to receive image instead of image folder.
- a dataset wrapper is added to convert single image into a pytorch dataset.
- a rt_demo.py is added to demonstrate the usage.
- ImageFolder dataset is modified
- ImgInference dataset is modified
- requirements are modified to freeze current tracker version. (Class in my repo inherits the tracker and changes its behavior)

Decrease inference memory usage

The default batch_size in demo.py needs ~10GB GPU memory
Original demo.py needs large vibe_batch_size to keep GRU hidden states
Since the GRU hidden state was fixed now, lowering the memory usage won't harm the accuracy anymore.
With the default setting in this repo, inference occupies ~1.3GB memory, which makes it runable on low-end GPU.
This will slow down the inference a little. The current setting (batchsize==1) reflect actual realtime processing speed.

# Large batch causes OOM in low-end memory card
tracker_batch_size = 12 -> 1
vibe_batch_size = 450 -> 1

Other fixes

Remove seqlen. The seqlen in demo.py has no usage (GRU sequence length is decided in runtime and equals to batch_size). With the fix in this repo, it is safe to set batch_size to 1.

You might also like...

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

25.7k Jan 9, 2023

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

328 Dec 17, 2022

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

38 Oct 21, 2022

pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

1.4k Jan 7, 2023

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

8 Oct 3, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

1.3k Jan 7, 2023

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

450 Dec 28, 2022

Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Related tags

Overview

Real-time VIBE

Inference VIBE frame-by-frame.

Overview

Getting Started

Try with google colab

Features

Explain

Pip installable.

GRU hidden state lost:

Real-time interface

Decrease inference memory usage

Other fixes

You might also like...

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

pytorch implementation of openpose including Hand and Body Pose Estimation.

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Releases(v1.0.0)

v1.0.0(Nov 29, 2021)

Owner

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Adversarial Attacks are Reversible via Natural Supervision

Open AI's Python library

an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

Source code for Fathony, Sahu, Willmott, & Kolter, "Multiplicative Filter Networks", ICLR 2021.

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Source code for our paper "Empathetic Response Generation with State Management"

Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Pure python PEMDAS expression solver without using built-in eval function

Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Doge-Prediction - Coding Club prediction ig

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

A time series processing library

A Python library for Deep Probabilistic Modeling

Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".