Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Last update: Nov 17, 2022

Related tags

Overview

CMIC-Retrieval

Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021.

Introduction

In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. However, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

About this repository

This repository provides data, pre-trained models and code.

Citations

@inProceedings{lin2021cmic,
	title={Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning},
	author={Lin, Ming-Xian and Yang, Jie and Wang, He and Lai, Yu-Kun and Jia, Rongfei and Zhao, Binqiang and Gao, Lin},
	year={2021},
	booktitle={International Conference on Computer Vision (ICCV)}
}

Updates

[Oct 1, 2021] Preliminary version of Data and Code released. For more code and data, coming soon. Please follow our updates.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Related tags

Overview

CMIC-Retrieval

Introduction

About this repository

Citations

Updates

Owner

An experiment to bait a generalized frontrunning MEV bot

Deep Image Matting implementation in PyTorch

AlphaNet Improved Training of Supernet with Alpha-Divergence

Clockwork Convnets for Video Semantic Segmentation

Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

Analysis of Smiles through reservoir sampling & RDkit

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

The official implementation of the Hybrid Self-Attention NEAT algorithm

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

A JAX-based research framework for writing differentiable numerical simulators with arbitrary discretizations

A data-driven maritime port simulator

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

African language Speech Recognition - Speech-to-Text

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.