PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Last update: Dec 05, 2022

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Code for the image-caption retrieval methods from VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , F. Faghri, D. J. Fleet, J. R. Kiros, S. Fidler, Proceedings of the British Machine Vision Conference (BMVC), 2018. (BMVC Spotlight)

Dependencies

We recommended to use Anaconda for the following packages.

Python 2.7 (Checkout branch python3)
PyTorch (>0.2) (Checkout branch pytorch4.1)
NumPy (>1.12.1)
TensorBoard
pycocotools
torchvision
matplotlib
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

Download the dataset files and pre-trained models. We use splits produced by Andrej Karpathy. The precomputed image features are from here and here. To use full image encoders, download the images from their original sources here, here and here.

wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/runs.tar

We refer to the path of extracted files for data.tar as $DATA_PATH and files for models.tar as $RUN_PATH. Extract vocab.tar to ./vocab directory.

Update: The vocabulary was originally built using all sets (including test set captions). Please see issue #29 for details. Please consider not using test set captions if building up on this project.

Evaluate pre-trained models

python -c "\
from vocab import Vocabulary
import evaluation
evaluation.evalrank('$RUN_PATH/coco_vse++/model_best.pth.tar', data_path='$DATA_PATH', split='test')"

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco.

Training new models

Run train.py:

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --logger_name 
runs/coco_vse++ --max_violation

Arguments used to train pre-trained models:

Method	Arguments
VSE0	`--no_imgnorm`
VSE++	`--max_violation`
Order0	`--measure order --use_abs --margin .05 --learning_rate .001`
Order++	`--measure order --max_violation`

Reference

If you found this code useful, please cite the following paper:

@article{faghri2018vse++,
  title={VSE++: Improving Visual-Semantic Embeddings with Hard Negatives},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  booktitle = {Proceedings of the British Machine Vision Conference ({BMVC})},
  url = {https://github.com/fartashf/vsepp},
  year={2018}
}

License

Apache License 2.0

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Related tags

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Dependencies

Download data

Evaluate pre-trained models

Training new models

Reference

License

Owner

Fartash Faghri

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models

Pytorch code for our paper Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains)

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

Pytorch implementation of the DeepDream computer vision algorithm

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

tinykernel - A minimal Python kernel so you can run Python in your Python

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

ML-Ensemble – high performance ensemble learning

Approaches to modeling terrain and maps in python

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

Code & Data for Enhancing Photorealism Enhancement

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Image Super-Resolution by Neural Texture Transfer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch