More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Last update: Aug 27, 2022

Overview

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021.

Ayan Kumar Bhunia, Pinaki nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song, “More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

SketchX_ShoeV2/ChairV2 Dataset: Download

Abstract

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performances gain. In particular, we introduce a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unlabelled photos to account for data scarcity. At the centre of our semi-supervision design is a sequential photo-to-sketch generation model that aims to generate paired sketches for unlabelled photos. Importantly, we further introduce a discriminator guided mechanism to guide against unfaithful generation, together with a distillation loss based regularizer to provide tolerance against noisy training samples. Last but not least, we treat generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other. Extensive experiments show that our semi-supervised model yields significant performance boost over the state-of-the-art supervised alternatives, as well as existing methods that can exploit unlabelled photos for FG-SBIR.

Outline

Figure: Our proposed method additionally leverages large scale photos without any manually labelled paired sketches to improve FG-SBIR performance. Moreover, we show that the two conjugate process, photo-to-sketch generation and fine-grained SBIR, could improve each other by joint training.

Joint Architecture

Figure: Our framework: a FG-SBIR model leverages large scale unlabelled photos using a sequential photo-to-sketch generation model along with labelled pairs. Discriminator guided instance-wise weighting and distillation loss are used to guard against the noisy generated data. Simultaneously, photo-to-sketch generation model learns by taking reward from FG-SBIR model and Discriminator via policy gradient (over both labelled and unlabelled) together with supervised VAE loss over labelled data. Note rasterization (vector to raster format) is a non-differentiable operation.

Examples

Figure: Qualitative results on our photo-to-sketch generation process, where sketch is shown with attention-map at progressive instances.

Citation

If you find this article useful in your research, please consider citing:

@InProceedings{semi-fgsbir,
author = {Ayan Kumar Bhunia and Pinaki Nath Chowdhury and Aneeshan Sain and Yongxin Yang and Tao Xiang and Yi-Zhe Song},
title = {More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Related tags

Overview

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021.

SketchX_ShoeV2/ChairV2 Dataset: Download

Abstract

Outline

Joint Architecture

Examples

Citation

Work done at SketchX Lab, CVSSP, University of Surrey.

Owner

Ayan Kumar Bhunia

mlpack: a scalable C++ machine learning library --

RCDNet: A Model-driven Deep Neural Network for Single Image Rain Removal (CVPR2020)

GNN-based Recommendation Benchma

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Implementation of algorithms for continuous control (DDPG and NAF).

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

2D Time independent Schrodinger equation solver for arbitrary shape of well

I will implement Fastai in each projects present in this repository.

PyTorch implementation of TSception V2 using DEAP dataset

The official implementation of Variable-Length Piano Infilling (VLI).

Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Accelerated Multi-Modal MR Imaging with Transformers

A small tool to joint picture including gif

This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping