Towards Making a Trojan-horse Attack on Text-to-Image Retrieval

Source code of our ICASSP2023 paper: Towards Making a Trojan-horse Attack on Text-to-Image Retrieval. This project implements Trojan-horse Attack for CLIP and CLIP-flickr on Flickr30k.

Environment

We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install all the required packages.

conda create -n tth python==3.8 -y
conda activate tth
git clone https://github.com/fly-dragon211/tth.git
cd tth
pip install -r requirements.txt

Data prepare

Dataset

We put the dataset files on ~/VisualSearch.

mkdir ~/VisualSearch
unzip -q "TTH_VisualSearch.zip" -d "${HOME}/VisualSearch/"

Readers need to download Flickr30k dataset and move the image files to ~/VisualSearch/flickr30k/flickr30k-images/. The Flickr30k is available on official website or Baidu Yun (https://pan.baidu.com/s/1r0RVUwctJsI0iNuVXHQ6kA 提取码：hrf3).

CLIP-flickr and CLIP-coco models

We provide the CLIP model which finetuned on Flickr30k and MSCOCO:

Baidu Yun: https://pan.baidu.com/s/1n8Sa7Fr9-G9KbZ3-FxS1_g?pwd=sbsv 提取码: sbsv

Readers can move the model files to ~/VisualSearch/flickr30k

Trojan-horse Attack

CLIP

 python TTH_attack.py \
 --device 0 flickr30ktest_add_ad None flickr30ktrain/flickr30kval/test \
 --attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
 --parm_adjust_config 0_1_1 --rootpath ~/VisualSearch \
 --batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt

R10 of LBIR system without/with Trojan-horse images w.r.t. specific queries. LBIR setup: CLIP + Flickr30ktest. Adversarial patches are learned with Flickr30ktrain as training data. The clear drop of R10 for truley relevant images and the clear increase of R10 for novel images show the success of the proposed method for making Trojan-horse attacks

Query set	Truly relevant images		Benign or TTH images
Query set	w/o TTH	w/ TTH	w/o TTH	w/ TTH
waiter	100.0	20.0	0.0	100.0
motorcycle	90.5	28.6	0.0	100.0
run	92.3	30.8	0.0	100.0
dress	92.4	42.4	0.0	100.0
floating	90.0	40.0	0.0	100.0
smiling	94.6	48.2	0.0	100.0
policeman	100.0	58.3	0.0	100.0
feeding	100.0	60.0	0.0	100.0
maroon	100.0	60.0	0.0	100.0
navy	100.0	66.7	0.0	100.0
cow	100.0	73.3	0.0	100.0
little	91.9	29.0	0.0	98.9
swimming	97.8	43.5	0.0	97.8
climbing	95.5	11.4	0.0	97.7
blue	95.4	61.4	0.0	97.3
dancing	80.0	33.3	0.0	96.7
yellow	93.2	68.9	0.0	96.3
floor	97.7	70.5	0.0	95.5
reading	94.7	52.6	0.0	94.7
jacket	91.4	69.9	0.0	94.6
pink	94.3	52.9	0.0	94.3
green	94.9	76.0	0.0	92.0
female	100.0	73.9	0.0	89.1
front	92.0	78.0	0.0	88.6
MEAN	94.9	52.1	0.0	97.2

CLIP-flickr

 CLIP_flickr="~/VisualSearch/flickr30k/CLIP-flickr.tar"
 
 python TTH_attack.py \
 --device 0 flickr30ktest_add_ad ${CLIP_flickr} flickr30ktrain/flickr30kval/test \
 --attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
 --parm_adjust_config 0_1_0 --rootpath ~/VisualSearch \
 --batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt

R10 of LBIR system without/with TTH images w.r.t. specific queries. LBIR setup: CLIP-flickr + Flickr30ktest.

Query set	Truly relevant images		Benign or TTH images
Query set	w/o TTH	w/ TTH	w/o TTH	w/ TTH
cow	100.0	86.7	0.0	100.0
motorcycle	100.0	95.2	0.0	100.0
policeman	100.0	100.0	0.0	100.0
waiter	100.0	100.0	0.0	100.0
feeding	100.0	100.0	0.0	100.0
reading	94.7	86.8	0.0	97.4
swimming	100.0	100.0	0.0	91.3
floor	100.0	100.0	2.3	86.4
dress	100.0	95.5	1.5	86.4
pink	97.7	96.6	0.0	86.2
climbing	95.5	84.1	0.0	84.1
smiling	100.0	98.2	3.6	83.9
dancing	90.0	83.3	0.0	83.3
yellow	97.5	93.8	3.1	77.6
green	98.9	97.1	0.6	73.1
floating	100.0	90.0	0.0	70.0
run	100.0	92.3	0.0	69.2
navy	100.0	100.0	0.0	66.7
little	98.9	98.4	1.1	65.6
female	100.0	100.0	2.2	60.9
jacket	96.8	95.7	0.0	57.0
blue	98.2	97.9	1.2	41.6
maroon	100.0	100.0	0.0	40.0
front	97.3	96.6	4.2	29.9
MEAN	98.6	95.3	0.8	77.1

References

@article{hu2022targeted,
  title={Towards Making a Trojan-horse Attack on Text-to-Image Retrieval},
  author={Hu, Fan and Chen, Aozhu and Li, Xirong},
  booktitle = {ICASSP},
  year={2023}
}

Contact

If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing

Fan Hu (hufan_hf@ruc.edu.cn)
Aozhu Chen (caz@ruc.edu.cn)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
image		image
model		model
README.md		README.md
TTH_VisualSearch.zip		TTH_VisualSearch.zip
TTH_attack.py		TTH_attack.py
__init__.py		__init__.py
bigfile.py		bigfile.py
common.py		common.py
data_provider.py		data_provider.py
evaluation.py		evaluation.py
generic_utils.py		generic_utils.py
loss.py		loss.py
requirements.txt		requirements.txt
stopwords_en.txt		stopwords_en.txt
stopwords_zh.txt		stopwords_zh.txt
textlib.py		textlib.py
txt2vec.py		txt2vec.py
util.py		util.py

fly-dragon211/tth

Folders and files

Latest commit

History

Repository files navigation

Towards Making a Trojan-horse Attack on Text-to-Image Retrieval

Environment

Data prepare

Dataset

CLIP-flickr and CLIP-coco models

Trojan-horse Attack

References

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages