PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Last update: Dec 23, 2022

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

PyTorch code for M²HSE. The local-level subenetwork of our M²HSE is built on top of the VSESC.

Xinlei Pei, Zheng Liu, Shaojing Yuan, Shanshan Gao, Huijian Han and Caiming Zhang. "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Introduction

We give a demo code of the Corel 5K dataset, including the details of training process for the global-level subnetwork and the local-level subnetwork.

Requirements

We recommended the following dependencies.

Python 3.6
PyTorch (1.3.1)
NumPy (1.19.2)
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

The raw images and the corrsponding texts can be downloaded from here. Note that we performed data cleaning on this dataset and the specific operations are described in the paper.

Besides, 1) for extracting the fine-grained visual features, the raw images are divided uniformly into 3*3 blocks. 2) we adopt the AlexNet, pre-trained on ImageNet, to extract the CNN features. 3) We upload text data in the ./data/coarse-grained-data/ and ./data/fine-grained-data . Therefore, for data preparation you have the following two options :

Download the above raw data and extract the corresponding features according to the strategy we introduced in the paper.
Contact us for relevant data. (Email: [email protected])

Training models

For training the global-level subnetwork:

Run train_global.py:

python train_global.py 
    --data_path ./data/coarse-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Global/Corel5K 
    --model_name ./checkpoint/M2HSE/Global/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --alpha_1 .8 
    --alpha_2 .8

For training the local-level subnetwork:

Run train_local.py:

python train_local.py 
    --data_path ./data/fine-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Local/Corel5K 
    --model_name ./checkpoint/M2HSE/Local/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --beta_1 .4 
    --beta_2 .4

Reference

Stay tuned. :)

License

Apache License 2.0

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

A collection of awesome resources image-to-image translation.

Learned model to estimate number of distinct values (NDV) of a population using a small sample.

EssentialMC2 Video Understanding

Modular Gaussian Processes

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Simple ONNX operation generator. Simple Operation Generator for ONNX.

🤖 A Python library for learning and evaluating knowledge graph embeddings

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A naive ROS interface for visualDet3D.

Python library for loading and using triangular meshes.

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Unifying Global-Local Representations in Salient Object Detection with Transformer

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This repository contains the scripts for downloading and validating scripts for the documents

Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

A collection of awesome resources image-to-image translation.

Learned model to estimate number of distinct values (NDV) of a population using a small sample.

EssentialMC2 Video Understanding

Modular Gaussian Processes

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Simple ONNX operation generator. Simple Operation Generator for ONNX.

🤖 A Python library for learning and evaluating knowledge graph embeddings

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A naive ROS interface for visualDet3D.

Python library for loading and using triangular meshes.

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Unifying Global-Local Representations in Salient Object Detection with Transformer

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This repository contains the scripts for downloading and validating scripts for the documents

Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)