A learning-based data collection tool for human segmentation

Last update: Jun 24, 2022

Overview

FullBodyFilter

A Learning-Based Data Collection Tool For Human Segmentation

Overview

Human segmentation is a difficult machine learning task of identifying and extracting the human in a picture. Most of the time this is done by using a convolutional neural network. In order to achieve an accurate and robust model, large amounts of data with varying human poses need to be collected to train the model. Collecting and labeling train data by hand takes lots of time and resources. This project explores another option to use automtation to collect and label pre-existing data from internet videos.

The model that was focused on is the DTEN ME model used for Zoom meetings virtual background.

Openpose is used to filter the video for suitable frames, in particular single person full body frames. Mask R-CNN is the teacher model that generates training labels. To find which images perform poorly on ME model, a comparison is done between ME masks and Mask R-CNN masks. The result is a set of images and masks that can be used as training data.

Overview of Program

A full report of the system design and implemenation details can be found in doc

Sample Results

Examples of train data saved. In each image bottom left is Mask R-CNN mask and bottom right is ME mask.

Usage

This project relies on Openpose and Mask R-CNN and all their dependencies. Instructions on how to set up each are found in there respective directories here.

Documentation on how to use scripts are located in doc.

A learning-based data collection tool for human segmentation

Related tags

Overview

FullBodyFilter

Contents

Overview

Sample Results

Usage

Owner

Robert Jiang

DeceFL: A Principled Decentralized Federated Learning Framework

Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

Classifying audio using Wavelet transform and deep learning

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Localization Distillation for Object Detection

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Mengzi Pretrained Models

As-ViT: Auto-scaling Vision Transformers without Training

Pydantic models for pywttr and aiopywttr.

I-BERT: Integer-only BERT Quantization

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Relative Positional Encoding for Transformers with Linear Complexity

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion