The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Last update: Dec 22, 2022

Overview

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Flick30k	SAF	75.6	92.7	96.9	56.5	82.0	88.4
	SGR	76.6	93.7	96.6	56.1	80.9	87.0
	SGRAF	78.4	94.6	97.5	58.2	83.0	89.1
MSCOCO1k	SAF	78.0	95.9	98.5	62.2	89.5	95.4
	SGR	77.3	96.0	98.6	62.1	89.6	95.3
	SGRAF	79.2	96.5	98.6	63.5	90.2	95.8
MSCOCO5k	SAF	55.5	83.8	91.8	40.1	69.7	80.4
	SGR	57.3	83.2	90.6	40.5	69.6	80.3
	SGRAF	58.8	84.8	92.1	41.6	70.9	81.5

Requirements

We recommended the following dependencies for Branch main.

Python 2.7
PyTorch (>=0.4.1)
NumPy (>=1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip

Pre-trained models and evaluation

Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

@inproceedings{Diao2021SGRAF,
  title={Similarity Reasoning and Filtration for Image-Text Matching},
  author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
  booktitle={AAAI},
  year={2021}
}

License

Apache License 2.0.
If any problems, please contact me at ([email protected]) or ([email protected]).

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Related tags

Overview

SGRAF

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

Owner

Ronnie_IIAU

Rename Images with Auto Generated Neural Image Captions

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

PyTorch implementation of SIFT descriptor

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Improving Object Detection by Label Assignment Distillation

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Python implementation of ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images, AAAI2022.

Official Code for "Non-deep Networks"

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Stochastic Scene-Aware Motion Prediction

Towards End-to-end Video-based Eye Tracking

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

A3C LSTM Atari with Pytorch plus A3G design

kapre: Keras Audio Preprocessors

Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) is a new approach of noise reduction methods. In this repository is shown the package developed for this new method based on \citepaper.

Employee-Managment - Company employee registration software in the face recognition system