This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Overview

Visual Attributes in the Wild (VAW)

This repository provides data for the VAW dataset as described in the CVPR 2021 Paper:

Learning to Predict Visual Attributes in the Wild

Khoi Pham, Kushal Kafle, Zhihong Ding, Zhe Lin, Quan Tran, Scott Cohen, Abhinav Shrivastava

VAW Main Image

Dataset Setup

Our VAW dataset is partly based on the annotations in the GQA and the VG-PhraseCut datasets.
Therefore, the images in the VAW dataset come from the Visual Genome dataset which is also the source of the images in the GQA and the VG-Phrasecut datasets. This section outlines the annotation format and basic statistics of our dataset.

Annotation Format

The annotations are found in data/train_part1.json, data/train_part2.json , data/val.json and data/test.json for train (split into two parts to circumvent github file-size limit) , validation and test splits in the VAW dataset respectively. The files consist of the following fields:

image_id: int (Image ids correspond to respective Visual Genome image ids)
instance_id: int (Unique instance ID)
instance_bbox: [x, y, width, height] (Bounding box co-ordinates for the instance)
instance_polygon: list of [x y] (List of vertices for segmentation polygon if exists else None)
object_name: str (Name of the object for the instance)
positive_attributes: list of str (Explicitly labeled positive attributes for the instance)
negative_attributes: list of str (Explicitly labeled negative attributes for the instance)

Download Images

The images can be downloaded from the Visual Genome website. The image_id field in our dataset corresponds to respective image ids in the v1.4 in the Visual Genome dataset.

Explore Data and View Live Demo

Head over to our accompanying website to explore the dataset. The website allows exploration of the VAW dataset by filtering our annotations by objects, positive attributes, or negative attributes in the train/val set. The website also shows interactive demo for our SCoNE algorithm as described in our paper.

Dataset Statistics

Basic Stats

Detail Stat
Number of Instances 260,895
Number of Total Images 72,274
Number of Unique Attributes 620
Number of Object Categories 2260
Average Annotation per Instance (Overall) 3.56
Average Annotation per Instance (Train) 3.02
Average Annotation per Instance (Val) 7.03

Evaluation

The evaluation script is provided in eval/evaluator.py. We also provide eval/eval.py as an example to show how to use the evaluation script. In particular, eval.py expects as input the followings:

  1. fpath_pred: path to the numpy array pred of your model prediction (shape (n_instances, n_class)). pred[i,j] is the predicted probability for attribute class j of instance i. We provide eval/pred.npy as a sample for this, which is the output of our best model (last row of table 2) in the paper.
  2. fpath_label: path to the numpy array gt_label that contains the groundtruth label of all instances in the test set (shape (n_instances, n_class)). gt_label[i,j] equals 1 if instance i is labeled positive with attribute j, equals 0 if it is labeled negative with attribute j, and equals 2 if it is unlabeled for attribute j. We provide eval/gt_label.npy as a sample for this, which we have created from data/test.json.
  3. Other files in folder data which have been set with default values in eval/eval.py.

From the eval folder, run the evaluation script as follows:

python eval.py --fpath_pred pred.npy --fpath_label gt_label.npy

We recently updated the grouping of attributes, So, there is a small discrepancy between the scores of our eval/pred.npy versus the numbers reported in the paper on each attribute group. A detailed attribute-wise breakdown will also be saved in a format shown in eval/output_detailed.txt.

Citation

Please cite our CVPR 2021 paper if you use the VAW dataset or the SCoNE algorithm in your work.

@InProceedings{Pham_2021_CVPR,
    author    = {Pham, Khoi and Kafle, Kushal and Lin, Zhe and Ding, Zhihong and Cohen, Scott and Tran, Quan and Shrivastava, Abhinav},
    title     = {Learning To Predict Visual Attributes in the Wild},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {13018-13028}
}

Disclaimer and Contact

This dataset contains objects labeled with a variety of attributes, including those applied to people. Datasets and their use are the subject of important ongoing discussions in the AI community, especially datasets that include people, and we hope to play an active role in those discussions. If you have any feedback regarding this dataset, we welcome your input at [email protected]

You might also like...
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

This is the official repo for TransFill:  Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset. Generative Query Network (GQN) in PyTorch as described in
Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Update 2019/06/24: A model trained on 10% of the Shepard-Metzler dataset has been added, the following notebook explains the main features of this mod

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

ssnt-loss ℹ️ This is a WIP project. the implementation is still being tested. A pure PyTorch implementation of the loss described in "Online Segment t

Repository for the paper
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Comments
  • Attribute super-class

    Attribute super-class

    Hi, Thank you for releasing the attribute annotations. A am very interested in the dataset. Are you also planning to release the superclass list of attributes from the paper (the Class imbalance and Attribute types)? And could you provide your evaluation code to reproduce your results and use the dataset?

    Best, Maria

    question 
    opened by mabravo641 1
  • Inference details

    Inference details

    Hi @kushalkafle, thanks for your great works of VAW and LSA. And I have some questions about the inference details of the SCoNE and TAP. During inference, For SCoNE, did you crop out the object region first and then evaluate the precision of the method for each bounding box? For TAP and OpenTAP, did you just input the test image and multi objects with bounding boxes, then the model will output the attributes of each object? I wonder if the above conjectures match the real experimental design. Looking forward to your reply and thanks in advance!

    opened by waveboo 0
  • object name embedding

    object name embedding

    Hi, I am a little confused about the object embedding procedure. As mentioned in the paper, GloVe 100-d word embeddings are used as the object name embedding. However, some of the object names are not contained in the Glove embeddings. How to tackle these names? For example, 'american flag', "boy's arm", 'two suitcases', 'computer keyboard', 'larger horse', 'living room wall', 'navy blue shirt', 'of the aisle', 'hotdog bun', 'train station', 'skull picture', 'disney princess', 'neck tie'.

    Thanks.

    opened by GriffinLiang 0
Releases(v1.0)
Build Graph Nets in Tensorflow

Graph Nets library Graph Nets is DeepMind's library for building graph networks in Tensorflow and Sonnet. Contact DeepMind 5.2k Jan 05, 2023

Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022
Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Deep Inside Convolutional Networks This is a caffe implementation to visualize the learnt model. Part of a class project at Georgia Tech Problem State

Jigar 61 Apr 15, 2022
Trainable PyTorch reproduction of AlphaFold 2

OpenFold A faithful PyTorch reproduction of DeepMind's AlphaFold 2. Features OpenFold carefully reproduces (almost) all of the features of the origina

AQ Laboratory 1.7k Dec 29, 2022
SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Vision Transformer for Small-Size Datasets Seung Hoon Lee and Seunghyun Lee and Byung Cheol Song | Paper Inha University Abstract Recently, the Vision

Lee SeungHoon 87 Jan 01, 2023
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Airbus Ship Detection Challenge

Airbus Ship Detection Challenge This is an open solution to the Airbus Ship Detection Challenge. Our goals We are building entirely open solution to t

minerva.ml 55 Nov 29, 2022
Yolov5 deepsort inference,使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

813 Dec 31, 2022
Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022
DISTIL: Deep dIverSified inTeractIve Learning.

DISTIL: Deep dIverSified inTeractIve Learning. An active/inter-active learning library built on py-torch for reducing labeling costs.

decile-team 110 Dec 06, 2022
A list of multi-task learning papers and projects.

This page contains a list of papers on multi-task learning for computer vision. Please create a pull request if you wish to add anything. If you are interested, consider reading our recent survey pap

svandenh 297 Dec 17, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Jacob 27 Oct 23, 2022
用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本和PARL(paddle)版本

用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learning 用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本、PARL(paddle)版本和pytorch版本

72 Dec 17, 2022
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 02, 2023
Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

Seyma Yucer 2 Jun 27, 2022