Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Related tags

Deep Learningpcme
Overview

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Pytorch implementation of PCME | Paper

Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de Rezende2 Yannis Kalantidis2 Diane Larlus2

1NAVER AI LAB
2NAVER LABS Europe

VIDEO

Updates

  • 23 Jun, 2021: Initial upload.

Installation

Install dependencies using the following command.

pip install cython && pip install -r requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/opt/conda/nltk_data")'
git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dockerfile

You can use my docker image as well

docker pull sanghyukchun/pcme:torch1.2-apex-dali

Please Add --model__cache_dir /vector_cache when you run the code

Configuration

All experiments are based on configuration files (see config/coco and config/cub). If you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:

python .py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob

See config/parser.py for details

Dataset preparation

COCO Caption

We followed the same split provided by VSE++. Dataset splits can be found in datasets/annotations.

Note that we also need instances_2014.json for computing PMRP score.

CUB Caption

Download images from this link, and download caption from reedscot/cvpr2016. You can use the image path and the caption path separately in the code.

Evaluate pretrained models

NOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient:
It first dumps all ranked items for each item to a local file, and compute R-precision.
We are planning to re-implement efficient PMRP as soon as possible.

COCO Caption

# Compute recall metrics
python evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image
# Compute plausible match R-Precision (PMRP) metric
python extract_rankings_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    --dump_to  \
    # --model__cache_dir /vector_cache # if you use my docker image

python evaluate_pmrp_coco.py --ranking_file 
Method I2T PMRP I2T [email protected] T2I PMRP T2I [email protected] Model file
PCME 45.0 68.8 46.0 54.6 link
PVSE K=1 40.3 66.7 41.8 53.5 -
PVSE K=2 42.8 69.2 43.6 55.2 -
VSRN 41.2 76.2 42.4 62.8 -
VSRN + AOQ 44.7 77.5 45.6 63.5 -

CUB Caption

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image

NOTE: If you just download file from reedscot/cvpr2016, then caption_root will be cvpr2016_cub/text_c10

If you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    --model__eval_method  \
    # --model__cache_dir /vector_cache # if you use my docker image

You can choose distance_method in ['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']

How to train

NOTE: we train each model with mixed-precision training (O2) on a single V100.
Since, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.
Please note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.

COCO Caption

python train_coco.py ./config/coco/pcme_coco.yaml --dataset_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 46 hours in a single V100 with mixed precision training.

CUB Caption

We use CUB Caption dataset (Reed, et al. 2016) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as positive. Since our split is based on the zero-shot learning benchmark (Xian, et al. 2017), we leave out 50 classes from 200 bird classes for the evaluation.

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Xian, Yongqin, Bernt Schiele, and Zeynep Akata. "Zero-shot learning-the good, the bad and the ugly." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

hyperparameter search

We additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation.

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --dataset_name cub_trainval1 \
    # --model__cache_dir /vector_cache # if you use my docker image

Similarly, you can use cub_trainval2 and cub_trainval3 as well.

training with full training classes

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 4 hours in a single V100 with mixed precision training.

How to cite

@inproceedings{chun2021pcme,
    title={Probabilistic Embeddings for Cross-Modal Retrieval},
    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},
    year={2021},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

MIT License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Owner
NAVER AI
Official account of NAVER AI, Korea No.1 Industrial AI Research Group
NAVER AI
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

FaceMorpher FaceMorpher is an innovative project to get a unique face morph (or interpolation for geeks) on a website. Yes, this means you can see fac

Anish 9 Jun 24, 2022
Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

cairone_fiorentino97 1 Dec 10, 2021
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Continual learning with sketched Jacobian approximations

Continual learning with sketched Jacobian approximations This repository contains the code for reproducing figures and results in the paper ``Provable

Machine Learning and Information Processing Laboratory 1 Jun 30, 2022
Dataloader tools for language modelling

Installation: pip install lm_dataloader Design Philosophy A library to unify lm dataloading at large scale Simple interface, any tokenizer can be inte

5 Mar 25, 2022
Motion and Shape Capture from Sparse Markers

MoSh++ This repository contains the official chumpy implementation of mocap body solver used for AMASS: AMASS: Archive of Motion Capture as Surface Sh

Nima Ghorbani 135 Dec 23, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Official implementation of "Learning Not to Reconstruct" (BMVC 2021)

Official PyTorch implementation of "Learning Not to Reconstruct Anomalies" This is the implementation of the paper "Learning Not to Reconstruct Anomal

Marcella Astrid 13 Dec 04, 2022
Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation.

Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation. It was introduced in Wright, Logan G. & Onodera, Tatsuhiro et al. (2021)1 to train Physical Neural Networ

McMahon Lab 230 Jan 05, 2023
This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

Kaizhi Yang 42 Dec 09, 2022
Pocsploit is a lightweight, flexible and novel open source poc verification framework

Pocsploit is a lightweight, flexible and novel open source poc verification framework

cckuailong 208 Dec 24, 2022
PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, aut

Meta Research 3.7k Jan 02, 2023
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package

Kevin Johnson 3.2k Jan 09, 2023
A simple version for graphfpn

GraphFPN: Graph Feature Pyramid Network for Object Detection Download graph-FPN-main.zip For training , run: python train.py For test with Graph_fpn

WorldGame 67 Dec 25, 2022
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum 🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 842 Dec 30, 2022
TinyML Cookbook, published by Packt

TinyML Cookbook This is the code repository for TinyML Cookbook, published by Packt. Author: Gian Marco Iodice Publisher: Packt About the book This bo

Packt 93 Dec 29, 2022
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022
Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Introduction This repository is for Deep Learning agent of Starcraft2. It is very similar to AlphaStar of DeepMind except size of network. I only test

Dohyeong Kim 136 Jan 04, 2023