COIN the currently largest dataset for comprehensive instruction video analysis.

Overview

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

Yansong Tang*, Dajun Ding, Yongming Rao*, Yu Zheng*, Danyang Zhang*, Lili Zhao, Jiwen Lu*, Jie Zhou*, Yongxiang Lian*, Yao Li, Jiali Sun, Chang Liu, Dongge You, Zirun Yang, Jiaojiao Ge, Jiayun Wang*

  • *Tsinghua University
  • Meitu Inc.

Contact: [email protected]

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

domain_target_mapping target_action_mapping
Domains Targets
... ...
Vehicle ChangeCarTire
Vehicle InstallLicensePlateFrame
... ...
Gadgets ReplaceCDDriveWithSSD
Target Id Target Label Action Id Action Label
... ... ... ...
13 ChangeCarTire 259 unscrew the screw
13 ChangeCarTire 260 jack up the car
13 ChangeCarTire 261 remove the tire
13 ChangeCarTire 262 put on the tire
13 ChangeCarTire 263 tighten the screws
... ... ... ...

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field Name Type Example Description
database string - Key filed of the annotation file.
- string LtRSn-ntcLY Youtube ID of the video.
duration float 56.640895694775196 Duration of the video in seconds.
class string ReplaceCDDriveWithSSD Name of the task in the video.
video_url string https://www.youtube.com/embed/LtRSn-ntcLY Url of the video.
start float 56.640895694775196 Start time of the ROI of the video.
end float 85.714362947023 End time of the ROI of the video.
subset string training or validation Subset of the video.
recipe_type int 131 ID number of the task.
annotation string - Annotation information of the video.
annotation:id int 212 ID number of the procedure.
annotation:label string take out the laptop CD drive Name of the procedure.
annotation:segment list of float (len=2) [60.0,69.0] Start and end time of the procedure.
Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

Blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models.

Rich 4.5k Jan 07, 2023
This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

HiRID-ICU-Benchmark This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

Biomedical Informatics at ETH Zurich 30 Dec 16, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
An all-in-one application to visualize multiple different local path planning algorithms

Table of Contents Table of Contents Local Planner Visualization Project (LPVP) Features Installation/Usage Local Planners Probabilistic Roadmap (PRM)

Abdur Javaid 47 Dec 30, 2022
A demo of how to use JAX to create a simple gravity simulation

JAX Gravity This repo contains a demo of how to use JAX to create a simple gravity simulation. It uses JAX's experimental ode package to solve the dif

Cristian Garcia 16 Sep 22, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

TDANet: Text-Guided Neural Image Inpainting, MM'2020 (Oral) MM | ArXiv This repository implements the paper "Text-Guided Neural Image Inpainting" by L

LisaiZhang 75 Dec 22, 2022
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

Facebook Research 887 Jan 08, 2023
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 0 Dec 15, 2022
PartImageNet is a large, high-quality dataset with part segmentation annotations

PartImageNet: A Large, High-Quality Dataset of Parts We will release our dataset and scripts soon after cleaning and approval. Introduction PartImageN

Ju He 77 Nov 30, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

Necati Cihan Camgoz 164 Dec 30, 2022
Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021) Recently, there has been a surge of diverse methods for performing image editing

749 Jan 09, 2023
😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

CoNeRF: Controllable Neural Radiance Fields This is the official implementation for "CoNeRF: Controllable Neural Radiance Fields" Project Page Paper V

Kacper Kania 61 Dec 24, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

120 Dec 15, 2022
A robust pointcloud registration pipeline based on correlation.

PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc

ETHZ ASL 101 Dec 01, 2022
RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

RODD Official Implementation of 2022 CVPRW Paper RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection Introduction: Recent studie

Umar Khalid 17 Oct 11, 2022
Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

KazuhitoTakahashi 12 Nov 09, 2022