Code/data of the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" (BMVC2021)

Overview

Hand-Object Contact Prediction (BMVC2021)

This repository contains the code and data for the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" by Takuma Yagi, Md. Tasnimul Hasan and Yoichi Sato.

Requirements

  • Python 3.6+
  • ffmpeg
  • numpy
  • opencv-python
  • pillow
  • scikit-learn
  • python-Levenshtein
  • pycocotools
  • torch (1.8.1, 1.4.0- for flow generation)
  • torchvision (0.9.1)
  • mllogger
  • flownet2-pytorch

Caution: This repository requires ~100GB space for testing, ~200GB space for trusted label training and ~3TB space for full training.

Getting Started

Download the data

  1. Download EPIC-KITCHENS-100 videos from the official site. Since this dataset uses 480p frames and optical flows for training and testing you need to download the original videos. Place them to data/videos/PXX/PXX_XX.MP4.
  2. Download and extract the ground truth label and pseudo-label (11GB, only required for training) to data/.

Required videos are listed in configs/*_vids.txt.

Clone repository

git clone  --recursive https://github.com/takumayagi/hand_object_contact_prediction.git

Install FlowNet2 submodule

See the official repo to install the custom components.
Note that flownet2-pytorch won't work on latest pytorch version (confirmed working in 1.4.0).

Download and place the FlowNet2 pretrained model to pretrained/.

Extract RGB frames

The following code will extract 480p rgb frames to data/rgb_frames.
Note that we extract by 60 fps for EK-55 and 50 fps for EK-100 extension.

Validation & test set

for vid in `cat configs/valid_vids.txt`; do bash preprocessing/extract_rgb_frames.bash $vid; done
for vid in `cat configs/test_vids.txt`; do bash preprocessing/extract_rgb_frames.bash $vid; done

Trusted training set

for vid in `cat configs/trusted_train_vids.txt`; do bash preprocessing/extract_rgb_frames.bash $vid; done

Noisy training set

# Caution: take up large space (~400GBs)
for vid in `cat configs/noisy_train_vids.txt`; do bash preprocessing/extract_rgb_frames.bash $vid; done

Extract Flow frames

Similar to above, we extract flow images (in 16-bit png). This requires the annotation files since we only extract flows used in training/test to save space.

# Same for test, trusted_train, and noisy_train
# For trusted labels (test, valid, trusted_train)
# Don't forget to add --gt
for vid in `cat configs/valid_vids.txt`; do python preprocessing/extract_flow_frames.py $vid --gt; done

# For pseudo-labels
# Extracting flows for noisy_train will take up large space
for vid in `cat configs/noisy_train_vids.txt`; do python preprocessing/extract_flow_frames.py $vid; done

Demo (WIP)

Currently, we only have evaluation code against pre-processed input sequences (& bounding boxes). We're planning to release a demo code with track generation.

Test

Download the pretrained models to pretrained/.

Evaluation by test set:

python train.py --model CrUnionLSTMHO --eval --resume pretrained/proposed_model.pth
python train.py --model CrUnionLSTMHORGB --eval --resume pretrained/rgb_model.pth  # RGB baseline
python train.py --model CrUnionLSTMHOFlow --eval --resume pretrained/flow_model.pth  # Flow baseline

Visualization

python train.py --model CrUnionLSTMHO --eval --resume pretrained/proposed_model.pth --vis

This will produce a mp4 file under <output_dir>/vis_predictions/.

Training

Full training

Download the initial models and place them to pretrained/training/.

python train.py --model CrUnionLSTMHO --dir_name proposed --semisupervised --iter_supervision 5000 --iter_warmup 0 --plc --update_clean --init_delta 0.05  --asymp_labeled_flip --nb_iters 800000 --lr_step_list 40000 --save_model --finetune_noisy_net --delta_th 0.01 --iter_snapshot 20000 --iter_evaluation 20000 --min_clean_label_ratio 0.25

Trusted label training

You can train the "supervised" model by the following:

# Train
python train_v1.py --model UnionLSTMHO --dir_name supervised_trainval --train_vids configs/trusted_train_vids.txt --nb_iters 25000 --save_model --iter_warmup 5000 --supervised

# Trainval
python train_v1.py --model UnionLSTMHO --dir_name supervised_trainval --train_vids configs/trusted_trainval_vids.txt --nb_iters 25000 --save_model --iter_warmup 5000 --eval_vids configs/test_vids.txt --supervised

Optional: Training initial models

To train the proposed model (CrUnionLSTMHO), we first train a noisy/clean network before applying gPLC.

python train.py --model UnionLSTMHO --dir_name noisy_pretrain --train_vids configs/noisy_train_vids_55.txt --nb_iters 40000 --save_model --only_boundary
python train.py --model UnionLSTMHO --dir_name clean_pretrain --train_vids configs/trusted_train_vids.txt --nb_iters 25000 --save_model --iter_warmup 2500 --supervised

Tips

  • Set larger --nb_workers an --nb_eval_workers if you have enough number of CPUs.
  • You can set --modality to either rgb or flow if training single-modality models.

Citation

Takuma Yagi, Md. Tasnimul Hasan, and Yoichi Sato, Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction. In Proceedings of the British Machine Vision Conference. 2021.

@inproceedings{yagi2021hand,
  title = {Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction},
  author = {Yagi, Takuma and Hasan, Md. Tasnimul and Sato, Yoichi},
  booktitle = {Proceedings of the British Machine Vision Conference},
  year={2021}
}

When you use the data for training and evaluation, please also cite the original dataset (EPIC-KITCHENS Dataset).

Owner
Takuma Yagi
An apprentice to an action recognition comedian
Takuma Yagi
Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

HsuanKung Yang 83 Oct 13, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

POPPY: Physical Optics Propagation in Python POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propaga

Space Telescope Science Institute 132 Dec 15, 2022
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

SELF-ATTENTIVE VAD: CONTEXT-AWARE DETECTION OF VOICE FROM NOISE (ICASSP 2021) Pytorch implementation of SELF-ATTENTIVE VAD | Paper | Dataset Yong Rae

97 Dec 23, 2022
DECAF: Deep Extreme Classification with Label Features

DECAF DECAF: Deep Extreme Classification with Label Features @InProceedings{Mittal21, author = "Mittal, A. and Dahiya, K. and Agrawal, S. and Sain

46 Nov 06, 2022
Official Pytorch implementation of MixMo framework

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Official PyTorch implementation of the MixMo framework | paper | docs Alexandr

79 Nov 07, 2022
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

349 Dec 29, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
OpenMMLab Pose Estimation Toolbox and Benchmark.

Introduction English | 简体中文 MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project. The master b

OpenMMLab 2.8k Dec 31, 2022
GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

GarmentNets This repository contains the source code for the paper GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape

Columbia Artificial Intelligence and Robotics Lab 43 Nov 21, 2022
Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer This repository contains the PyTorch code for Evo-ViT. This work proposes a slow-fas

YifanXu 53 Dec 05, 2022
Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

Launch Platform 16 Oct 11, 2022
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 09, 2022
A python package to perform same transformation to coco-annotation as performed on the image.

coco-transform-util A python package to perform same transformation to coco-annotation as performed on the image. Installation Way 1 $ git clone https

1 Jan 14, 2022
Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

Adversarial Autoencoders (AAE) Tensorflow implementation of Adversarial Autoencoders (ICLR 2016) Similar to variational autoencoder (VAE), AAE imposes

Qian Ge 236 Nov 13, 2022
ICCV2021 - A New Journey from SDRTV to HDRTV.

ICCV2021 - A New Journey from SDRTV to HDRTV.

XyChen 82 Dec 27, 2022
Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Time2box Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

LingCai 4 Aug 23, 2022