Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Last update: Nov 07, 2022

Related tags

Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	66.6	88.6	94.0	51.6	79.1	87.5
NSGDC-Large	67.8	89.6	94.2	53.3	80.0	88.0

Flickr30K

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	87.9	98.1	99.3	74.5	93.3	96.3
NSGDC-Large	90.6	98.8	99.1	77.3	94.3	97.3

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Related tags

Overview

NSGDC

Requirements

Image-Text Retrieval

Download Data

Launch the Docker Container

Image-Text Retrieval (Flickr30k)

Image-Text Retrieval (COCO)

Run Inference

Results

MS-COCO

Flickr30K

Owner

Zhihao Fan

Code release for ICCV 2021 paper "Anticipative Video Transformer"

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

Tracking Pipeline helps you to solve the tracking problem more easily

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

Our solution for SSN Invente 2021's Hackathon

D2Go is a toolkit for efficient deep learning

Patch-Diffusion Code (AAAI2022)

Release of the ConditionalQA dataset

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Viperdb - A tiny log-structured key-value database written in pure Python

EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

Repo for flood prediction using LSTMs and HAND

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

A Unified Generative Framework for Various NER Subtasks.

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Neural style transfer as a class in PyTorch

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

An imperfect information game is a type of game with asymmetric information