Gesture-Detection-and-Depth-Estimation

This is my graduation project.

(1) In this project, I use the YOLOv3 object detection model to detect gesture in RGB image. I trained the model on the self-made gesture dataset to obtain the gesture detection model based on deep learning. Then by testing the model on the test dataset, I found that the model can meet the requirements of real-time gesture detection while maintaining high accuracy.

(2) Then I tried to use the monocular depth estimation algorithm based on depth learning to estimate the depth of gesture object from a single RGB image, including FastDepth algorithm and the improved detection model based on YOLOv3. The FastDepth algorithm is trained and tested on the self-made gesture-depth dataset. Then, by adding a depth vector to output dimensions and modifying the loss function, the function of estimating target depth is added to the YOLOv3 model. Then I trained and tested the modified YOLOv3 model on the same gesture-depth dataset. Finally, the experiment results show that both methods can estimate the depth information of gesture object in RGB image to a certain extent.

Gesture detection:

Depth data:

Estimate target depth：

(3) Also, I developed a simple program with PyOpenGL that can use gesture information to draw simple shapes in three-dimensional space.

Try to draw a cube:

For more information, you can check my final paper.

YOLOv3 model is based on coldlarry's model: https://github.com/coldlarry/YOLOv3-complete-pruning

Graduation Project

Related tags

Overview

Gesture-Detection-and-Depth-Estimation

Owner

ChaosAT

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models

OBBDetection is a oriented object detection library, which is based on MMdetection.

A PyTorch implementation of EfficientDet.

TF Image Segmentation: Image Segmentation framework

Makes patches from huge resolution .svs slide files using openslide

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

Perturb-and-max-product: Sampling and learning in discrete energy-based models

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Neural Factorization of Shape and Reflectance Under An Unknown Illumination

Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

GeneralOCR is open source Optical Character Recognition based on PyTorch.

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Fuzzy Overclustering (FOC)

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Mixed Transformer UNet for Medical Image Segmentation

PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

A new video text spotting framework with Transformer

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models