Skip to content

fly-dragon211/tth

Repository files navigation

Towards Making a Trojan-horse Attack on Text-to-Image Retrieval

Source code of our ICASSP2023 paper: Towards Making a Trojan-horse Attack on Text-to-Image Retrieval. This project implements Trojan-horse Attack for CLIP and CLIP-flickr on Flickr30k.

image-20220422124016610

Environment

We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install all the required packages.

conda create -n tth python==3.8 -y
conda activate tth
git clone https://github.com/fly-dragon211/tth.git
cd tth
pip install -r requirements.txt

Data prepare

Dataset

We put the dataset files on ~/VisualSearch.

mkdir ~/VisualSearch
unzip -q "TTH_VisualSearch.zip" -d "${HOME}/VisualSearch/"

Readers need to download Flickr30k dataset and move the image files to ~/VisualSearch/flickr30k/flickr30k-images/. The Flickr30k is available on official website or Baidu Yun (https://pan.baidu.com/s/1r0RVUwctJsI0iNuVXHQ6kA 提取码:hrf3).

CLIP-flickr and CLIP-coco models

We provide the CLIP model which finetuned on Flickr30k and MSCOCO:

Baidu Yun: https://pan.baidu.com/s/1n8Sa7Fr9-G9KbZ3-FxS1_g?pwd=sbsv 提取码: sbsv

Readers can move the model files to ~/VisualSearch/flickr30k

Trojan-horse Attack

image-20220521094147787

CLIP

 python TTH_attack.py \
 --device 0 flickr30ktest_add_ad None flickr30ktrain/flickr30kval/test \
 --attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
 --parm_adjust_config 0_1_1 --rootpath ~/VisualSearch \
 --batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt

R10 of LBIR system without/with Trojan-horse images w.r.t. specific queries. LBIR setup: CLIP + Flickr30ktest. Adversarial patches are learned with Flickr30ktrain as training data. The clear drop of R10 for truley relevant images and the clear increase of R10 for novel images show the success of the proposed method for making Trojan-horse attacks

Query set Truly relevant images Benign or TTH images
w/o TTH w/ TTH w/o TTH w/ TTH
waiter 100.0 20.0 0.0 100.0
motorcycle 90.5 28.6 0.0 100.0
run 92.3 30.8 0.0 100.0
dress 92.4 42.4 0.0 100.0
floating 90.0 40.0 0.0 100.0
smiling 94.6 48.2 0.0 100.0
policeman 100.0 58.3 0.0 100.0
feeding 100.0 60.0 0.0 100.0
maroon 100.0 60.0 0.0 100.0
navy 100.0 66.7 0.0 100.0
cow 100.0 73.3 0.0 100.0
little 91.9 29.0 0.0 98.9
swimming 97.8 43.5 0.0 97.8
climbing 95.5 11.4 0.0 97.7
blue 95.4 61.4 0.0 97.3
dancing 80.0 33.3 0.0 96.7
yellow 93.2 68.9 0.0 96.3
floor 97.7 70.5 0.0 95.5
reading 94.7 52.6 0.0 94.7
jacket 91.4 69.9 0.0 94.6
pink 94.3 52.9 0.0 94.3
green 94.9 76.0 0.0 92.0
female 100.0 73.9 0.0 89.1
front 92.0 78.0 0.0 88.6
MEAN 94.9 52.1 0.0 97.2

CLIP-flickr

 CLIP_flickr="~/VisualSearch/flickr30k/CLIP-flickr.tar"
 
 python TTH_attack.py \
 --device 0 flickr30ktest_add_ad ${CLIP_flickr} flickr30ktrain/flickr30kval/test \
 --attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
 --parm_adjust_config 0_1_0 --rootpath ~/VisualSearch \
 --batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt

R10 of LBIR system without/with TTH images w.r.t. specific queries. LBIR setup: CLIP-flickr + Flickr30ktest.

Query set Truly relevant images Benign or TTH images
w/o TTH w/ TTH w/o TTH w/ TTH
cow 100.0 86.7 0.0 100.0
motorcycle 100.0 95.2 0.0 100.0
policeman 100.0 100.0 0.0 100.0
waiter 100.0 100.0 0.0 100.0
feeding 100.0 100.0 0.0 100.0
reading 94.7 86.8 0.0 97.4
swimming 100.0 100.0 0.0 91.3
floor 100.0 100.0 2.3 86.4
dress 100.0 95.5 1.5 86.4
pink 97.7 96.6 0.0 86.2
climbing 95.5 84.1 0.0 84.1
smiling 100.0 98.2 3.6 83.9
dancing 90.0 83.3 0.0 83.3
yellow 97.5 93.8 3.1 77.6
green 98.9 97.1 0.6 73.1
floating 100.0 90.0 0.0 70.0
run 100.0 92.3 0.0 69.2
navy 100.0 100.0 0.0 66.7
little 98.9 98.4 1.1 65.6
female 100.0 100.0 2.2 60.9
jacket 96.8 95.7 0.0 57.0
blue 98.2 97.9 1.2 41.6
maroon 100.0 100.0 0.0 40.0
front 97.3 96.6 4.2 29.9
MEAN 98.6 95.3 0.8 77.1

References

@article{hu2022targeted,
  title={Towards Making a Trojan-horse Attack on Text-to-Image Retrieval},
  author={Hu, Fan and Chen, Aozhu and Li, Xirong},
  booktitle = {ICASSP},
  year={2023}
}

Contact

If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing

About

Source code of our ICASSP2023 paper: Towards Making a Trojan-horse Attack on Text-to-Image Retrieval.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages