MoveNet Single Pose on DepthAI

Overview

MoveNet Single Pose tracking on DepthAI

Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...).

A convolutional neural network model that runs on RGB images and predicts human joint locations of a single person. Two variant: Lightning and Thunder, the latter being slower but more accurate. MoveNet uses an smart cropping based on detections from the previous frame when the input is a sequence of frames. This allows the model to devote its attention and resources to the main subject, resulting in much better prediction quality without sacrificing the speed.

Demo

For MoveNet on OpenVINO, please visit : openvino_movenet

Architecture: Host mode vs Edge mode

The cropping algorithm determines from the body detected in frame N, on which region of frame N+1 the inference will run. The mode (Host or Edge) describes where this algorithm is run :

  • in Host mode, the cropping algorithm is run on the host cpu. Only this mode allows images or video files as input. The flow of information between the host and the device is bi-directional: in particular, the host sends frames or cropping instructions to the device;
  • in Edge mode, tthe cropping algorithm is run on the MyriadX. So, in this mode, all the bricks of MoveNet (inference, determination of the cropping region for next frame, cropping) are executed on the device. The only information exchanged are the body keypoints and the camera video frame.

Note: in either mode, when using the color camera, you can choose to disable the sending of the video frame to the host, by specifying "rgb_laconic" instead of "rgb" as input source.

Architecture

Install

Currently, the scripting node capabilty is an alpha release. It is important to use the version specified in the requirements.txt

Install the python packages DepthAI, Opencv with the following command:

python3 -m pip install -r requirements.txt

Run

Usage:

> python3 demo.py -h                                               
usage: demo.py [-h] [-e] [-m MODEL] [-i INPUT] [-s SCORE_THRESHOLD]
               [--internal_fps INTERNAL_FPS]
               [--internal_frame_size INTERNAL_FRAME_SIZE] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -e, --edge            Use Edge mode (the cropping algorithm runs on device)
  -m MODEL, --model MODEL
                        Model to use : 'thunder' or 'lightning' or path of a
                        blob file (default=thunder
  -i INPUT, --input INPUT
                        'rgb' or 'rgb_laconic' or path to video/image file to
                        use as input (default: rgb)
  -s SCORE_THRESHOLD, --score_threshold SCORE_THRESHOLD
                        Confidence score to determine whether a keypoint
                        prediction is reliable (default=0.200000)
  --internal_fps INTERNAL_FPS
                        Fps of internal color camera. Too high value lower NN
                        fps (default: depends on the model
  --internal_frame_size INTERNAL_FRAME_SIZE
                        Internal color camera frame size (= width = height) in
                        pixels (default=640)
  -o OUTPUT, --output OUTPUT
                        Path to output video file

Examples :

  • To use default internal color camera as input with the Thunder model (Host mode):

    python3 demo.py

  • To use default internal color camera as input with the Thunder model (Edge mode):

    python3 demo.py -e

  • To use default internal color camera as input with the Lightning model :

    python3 demo.py -m lightning

  • To use a file (video or image) as input with the Thunder model :

    python3 demo.py -i filename

  • When using the internal camera, to change its FPS to 15 :

    python3 BlazeposeOpenvino.py --internal_fps 15

    Note: by default, the internal camera FPS is set to 26 for Lightning, and to 12 for Thunder. These values are based on my own observations. Please, don't hesitate to play with this parameter to find the optimal value. If you observe that your FPS is well below the default value, you should lower the FPS with this option until the set FPS is just above the observed FPS.

  • When using the internal camera, you may not need to work with the full resolution. You can work with a lower resolution (and win a bit of FPS) by using this option:

    python3 BlazeposeOpenvino.py --internal_frame_size 450

    Note: currently, depthai supports only some possible values for this argument. The value you specify will be replaced by the closest possible value (here 432 instead of 450).

Keypress Function
space Pause
c Show/hide cropping region
f Show/hide FPS

The models

They were generated by PINTO from the original models Thunder V3 and Lightning V3. Currently, they are an slight adaptation of the models available there: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/115_MoveNet. This adaptation should be temporary and is due to the non support by the depthai ImageManip node of interleaved images.

Code

To facilitate reusability, the code is splitted in 2 classes:

  • MovenetDepthai, which is responsible of computing the body keypoints. The importation of this class depends on the mode:
# For Host mode:
from MovenetDepthai import MovenetDepthai
# For Edge mode:
from MovenetDepthaiEdge import MovenetDepthai
  • MovenetRenderer, which is responsible of rendering the keypoints and the skeleton on the video frame.

This way, you can replace the renderer from this repository and write and personalize your own renderer (for some projects, you may not even need a renderer).

The file demo.py is a representative example of how to use these classes:

from MovenetDepthai import MovenetDepthai
from MovenetRenderer import MovenetRenderer

# I have removed the argparse stuff to keep only the important code

pose = MovenetDepthai(input_src=args.input, 
            model=args.model,    
            score_thresh=args.score_threshold,           
            internal_fps=args.internal_fps,
            internal_frame_size=args.internal_frame_size
            )

renderer = MovenetRenderer(
                pose, 
                output=args.output)

while True:
    # Run blazepose on next frame
    frame, body = pose.next_frame()
    if frame is None: break
    # Draw 2d skeleton
    frame = renderer.draw(frame, body)
    key = renderer.waitKey(delay=1)
    if key == 27 or key == ord('q'):
        break
renderer.exit()
pose.exit()

Examples

Semaphore alphabet Sempahore alphabet
Yoga Pose Classification Yoga Pose Classification

Credits

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator

Phong Nguyen Ha 4 May 26, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 03, 2023
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

2.7k Jan 05, 2023
😊 Python module for face feature changing

PyWarping Python module for face feature changing Installation pip install pywarping If you get an error: No such file or directory: 'cmake': 'cmake',

Dopevog 10 Sep 10, 2021
Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match

VITA 20 Sep 04, 2022
QuALITY: Question Answering with Long Input Texts, Yes!

QuALITY: Question Answering with Long Input Texts, Yes! Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, An

ML² AT CILVR 61 Jan 02, 2023
Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

CaGCN This repo is for source code of NeurIPS 2021 paper "Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration". Paper L

6 Dec 19, 2022
Memory-Augmented Model Predictive Control

Memory-Augmented Model Predictive Control This repository hosts the source code for the journal article "Composing MPC with LQR and Neural Networks fo

Fangyu Wu 1 Jun 19, 2022
Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

Alex Nichol 2 Aug 22, 2022
Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

SESS: Self-Ensembling Semi-Supervised 3D Object Detection Created by Na Zhao from National University of Singapore Introduction This repository contai

125 Dec 23, 2022
A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

##A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation. #USAGE To run the trained classifier on some images: python w

Alex Seewald 13 Nov 17, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 03, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
Dynamic Capacity Networks using Tensorflow

Dynamic Capacity Networks using Tensorflow Dynamic Capacity Networks (DCN; http://arxiv.org/abs/1511.07838) implementation using Tensorflow. DCN reduc

Taeksoo Kim 8 Feb 23, 2021
Pytoydl: A toy deep learning framework built upon numpy.

Documents: https://pytoydl.readthedocs.io/zh/latest/ Pytoydl A toy deep learning framework built upon numpy. You can star this repository to keep trac

28 Dec 10, 2022
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022
A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

MouseLand 733 Dec 29, 2022
Official git repo for the CHIRP project

CHIRP Project This is the official git repository for the CHIRP project. Pull requests are accepted here, but for the moment, the main repository is s

Dan Smith 77 Jan 08, 2023