Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Last update: Nov 17, 2022

Related tags

Overview

CMIC-Retrieval

Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021.

Introduction

In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. However, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

About this repository

This repository provides data, pre-trained models and code.

Citations

@inProceedings{lin2021cmic,
	title={Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning},
	author={Lin, Ming-Xian and Yang, Jie and Wang, He and Lai, Yu-Kun and Jia, Rongfei and Zhao, Binqiang and Gao, Lin},
	year={2021},
	booktitle={International Conference on Computer Vision (ICCV)}
}

Updates

[Oct 1, 2021] Preliminary version of Data and Code released. For more code and data, coming soon. Please follow our updates.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Related tags

Overview

CMIC-Retrieval

Introduction

About this repository

Citations

Updates

Owner

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

Python PID Tuner - Based on a FOPDT model obtained using a Open Loop Process Reaction Curve

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Parsing, analyzing, and comparing source code across many languages

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

Flaxformer: transformer architectures in JAX/Flax

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

WTTE-RNN a framework for churn and time to event prediction

B-cos Networks: Attention is All we Need for Interpretability

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Encoding Causal Macrovariables

State of the art Semantic Sentence Embeddings

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Improving Machine Translation Systems via Isotopic Replacement

Implementation of C-RNN-GAN.