Official implementation of the paper "Backdoor Attacks on Self-Supervised Learning".

Overview

SSL-Backdoor

Abstract

Large-scale unlabeled data has allowed recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (MoCo and BYOL) use an inductive bias that different augmentations (e.g. random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks where an attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. The model performance is good on clean test images but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning since the unlabeled data is large and as a result, an inspection of the data to avoid the presence of poisoned data is prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also develop a knowledge distillation based defense algorithm that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor.

Paper

Backdoor Attacks on Self-Supervised Learning

Updates

  • 04/07/2021 - Poison generation code added.
  • 04/08/2021 - MoCo v2, BYOL code added.
  • 04/14/2021 - Jigsaw, RotNet code added.

Requirements

All experiments were run using the following dependencies.

  • python=3.7
  • pytorch=1.6.0
  • torchvision=0.7.0
  • wandb=0.10.21 (for BYOL)
  • torchnet=0.0.4 (for RotNet)

Optional

  • faiss=1.6.3 (for k-NN evaluation)

Create ImageNet-100 dataset

The ImageNet-100 dataset (random 100-class subset of ImageNet), commonly used in self-supervision benchmarks, was introduced in [1].

To create ImageNet-100 from ImageNet, use the provided script.

cd scripts
python create_imagenet_subset.py --subset imagenet100_classes.txt --full_imagenet_path <path> --subset_imagenet_path <path>

Poison Generation

To generate poisoned ImageNet-100 images, create your own configuration file. Some examples, which we use for our targeted attack experiments, are in the cfg directory.

  • You can choose the poisoning to be Targeted (poison only one category) or Untargeted
  • The trigger can be text or an image (We used triggers introduced in [2]).
  • The parameters of the trigger (e.g. location, size, alpha etc.) can be modified according to the experiment.
  • The poison injection rate for the training set can be modified.
  • You can choose which split to generate. "train" generates poisoned training data, "val_poisoned" poisons all the validation images for evaluation purpose. Note: The poisoned validation images are all resized and cropped to 224x224 before trigger pasting so that all poisoned images have uniform trigger size.
cd poison-generation
python generate_poison.py <configuration-file>

SSL Methods

Pytorch Custom Dataset

All images are loaded from filelists of the form given below.

<dir-name-1>/xxx.ext <target-class-index>
<dir-name-1>/xxy.ext <target-class-index>
<dir-name-1>/xxz.ext <target-class-index>

<dir-name-2>/123.ext <target-class-index>
<dir-name-2>/nsdf3.ext <target-class-index>
<dir-name-2>/asd932_.ext <target-class-index>

Evaluation

All evaluation scripts return confusion matrices for clean validation data and a csv file enumerating the TP and FP for each category.

MoCo v2 [3]

The implementation for MoCo is from https://github.com/SsnL/moco_align_uniform modified slightly to suit our experimental setup.

To train a ResNet-18 MoCo v2 model on ImageNet-100 on 2 NVIDIA GEFORCE RTX 2080 Ti GPUs:

cd moco
CUDA_VISIBLE_DEVICES=0,1 python main_moco.py \
                        -a resnet18 \
                        --lr 0.06 --batch-size 256 --multiprocessing-distributed \
                        --world-size 1 --rank 0 --aug-plus --mlp --cos --moco-align-w 0 \
                        --moco-unif-w 0 --moco-contr-w 1 --moco-contr-tau 0.2 \
                        --dist-url tcp://localhost:10005 \ 
                        --save-folder-root <path> \
                        --experiment-id <ID> <train-txt-file>

To train linear classifier on frozen MoCo v2 embeddings on ImageNet-100:

CUDA_VISIBLE_DEVICES=0 python eval_linear.py \
                        --arch moco_resnet18 \
                        --weights <SSL-model-checkpoint-path>\
                        --train_file <path> \
                        --val_file <path>

We use the linear classifier normalization from CompRess: Self-Supervised Learning by Compressing Representations which says "To reduce the computational overhead of tuning the hyperparameters per experiment, we standardize the Linear evaluation as following. We first normalize the features by L2 norm, then shift and scale each dimension to have zero mean and unit variance."

To evaluate linear classifier on clean and poisoned validation set: (This script loads the cached mean and variance from previous step.)

CUDA_VISIBLE_DEVICES=0 python eval_linear.py \
                        --arch moco_resnet18 \
                        --weights <SSL-model-checkpoint-path> \
                        --val_file <path> \
                        --val_poisoned_file <path> \
                        --resume <linear-classifier-checkpoint> \
                        --evaluate --eval_data <evaluation-ID> \
                        --load_cache

To run k-NN evaluation of frozen MoCo v2 embeddings on ImageNet-100 (faiss library needed):

CUDA_VISIBLE_DEVICES=0 python eval_knn.py \
                        -a moco_resnet18 \
                        --weights <SSL-model-checkpoint-path> \
                        --train_file <path> \
                        --val_file <path> \
                        --val_poisoned_file <path> \
                        --eval_data <evaluation-ID>

BYOL [4]

The implementation for BYOL is from https://github.com/htdt/self-supervised modified slightly to suit our experimental setup.

To train a ResNet-18 BYOL model on ImageNet-100 on 4 NVIDIA GEFORCE RTX 2080 Ti GPUs: (This scripts monitors the k-NN accuracy on clean ImageNet-100 dataset at regular intervals.)

cd byol
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train \
                                    --exp_id <ID> \
                                    --dataset imagenet --lr 2e-3 --emb 128 --method byol \
                                    --arch resnet18 --epoch 200 \
                                    --train_file_path <path> \
                                    --train_clean_file_path <path> 
                                    --val_file_path <path>
                                    --save_folder_root <path>

To train linear classifier on frozen BYOL embeddings on ImageNet-100:

CUDA_VISIBLE_DEVICES=0 python -m test --dataset imagenet \
                            --train_clean_file_path <path> \
                            --val_file_path <path> \
                            --emb 128 --method byol --arch resnet18 \
                            --fname <SSL-model-checkpoint-path>

To evaluate linear classifier on clean and poisoned validation set:

CUDA_VISIBLE_DEVICES=0 python -m test --dataset imagenet \
                            --val_file_path <path> \
                            --val_poisoned_file_path <path> \
                            --emb 128 --method byol --arch resnet18 \
                            --fname <SSL-model-checkpoint-path> \
                            --clf_chkpt <linear-classifier-checkpoint-path> \
                            --eval_data <evaluation-ID> --evaluate

Jigsaw [5]

The implementation for Jigsaw is our own Pytorch reimplementation based on the authors’ Caffe code https://github.com/MehdiNoroozi/JigsawPuzzleSolver modified slightly to suit our experimental setup. There might be some legacy Pytorch code, but that doesn't affect the correctness of training or evaluation. If you are looking for a recent Pytorch implementation of Jigsaw, https://github.com/facebookresearch/vissl is a good place to start.

To train a ResNet-18 Jigsaw model on ImageNet-100 on 1 NVIDIA GEFORCE RTX 2080 Ti GPU: (The code doesn't support Pytorch distributed training.)

cd jigsaw
CUDA_VISIBLE_DEVICES=0 python train_jigsaw.py \
                                --train_file <path> \
                                --val_file <path> \
                                --save <path>

To train linear classifier on frozen Jigsaw embeddings on ImageNet-100:

CUDA_VISIBLE_DEVICES=0 python eval_conv_linear.py \
                        -a resnet18 --train_file <path> \
                        --val_file <path> \
                        --save <path> \
                        --weights <SSL-model-checkpoint-path>

To evaluate linear classifier on clean and poisoned validation set:

CUDA_VISIBLE_DEVICES=0 python eval_conv_linear.py -a resnet18 \
                            --val_file <path> \
                            --val_poisoned_file <path> \
                            --weights <SSL-model-checkpoint-path> \
                            --resume <linear-classifier-checkpoint-path> \
                            --evaluate --eval_data <evaluation-ID>

RotNet [6]

The implementation for RotNet is from https://github.com/gidariss/FeatureLearningRotNet modified slightly to suit our experimental setup. There might be some legacy Pytorch code, but that doesn't affect the correctness of training or evaluation. If you are looking for a recent Pytorch implementation of RotNet, https://github.com/facebookresearch/vissl is a good place to start.

To train a ResNet-18 Jigsaw model on ImageNet-100 on 1 NVIDIA TITAN RTX GPU: (The code doesn't support Pytorch distributed training. Choose the experiment ID config file as required.)

cd rotnet
CUDA_VISIBLE_DEVICES=0 python main.py --exp <ImageNet100_RotNet_*> --save_folder <path>

To train linear classifier on frozen RotNet embeddings on ImageNet-100:

CUDA_VISIBLE_DEVICES=0 python main.py --exp <ImageNet100_LinearClassifiers_*> --save_folder <path>

To evaluate linear classifier on clean and poisoned validation set:

CUDA_VISIBLE_DEVICES=0 python main.py --exp <ImageNet100_LinearClassifiers_*> \
                            --save_folder <path> \
                            --evaluate --checkpoint=<epoch_num> --eval_data <evaluation-ID>

Acknowledgement

This material is based upon work partially supported by the United States Air Force under Contract No. FA8750‐19‐C‐0098, funding from SAP SE, NSF grant 1845216, and also financial assistance award number 60NANB18D279 from U.S. Department of Commerce, National Institute of Standards and Technology. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the United States Air Force, DARPA, or other funding agencies.

References

[1] Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849,2019.

[2] Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. Hidden trigger backdoor attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 11957–11965, 2020.

[3] Chen, Xinlei, et al. "Improved baselines with momentum contrastive learning." arXiv preprint arXiv:2003.04297 (2020).

[4] Jean-Bastien Grill, Florian Strub, Florent Altch́e, and et al. Bootstrap your own latent - a new approach to self-supervised learning. In Advances in Neural Information Processing Systems, volume 33, pages 21271–21284, 2020.

[5] Noroozi, Mehdi, and Paolo Favaro. "Unsupervised learning of visual representations by solving jigsaw puzzles." European conference on computer vision. Springer, Cham, 2016.

[6] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.

Citation

If you find our paper, code or models useful, please cite us using

@article{saha2021backdoor,
  title={Backdoor Attacks on Self-Supervised Learning},
  author={Saha, Aniruddha and Tejankar, Ajinkya and Koohpayegani, Soroush Abbasi and Pirsiavash, Hamed},
  journal={arXiv preprint arXiv:2105.10123},
  year={2021}
}

Questions/Issues

Please create an issue on the Github Repo directly or contact [email protected] for any questions about the code.

Owner
UMBC Vision
The Computer Vision Lab at the University of Maryland, Baltimore County (UMBC)
UMBC Vision
Exploiting CVE-2021-44228 in Unifi Network Application for remote code execution and more

Log4jUnifi Exploiting CVE-2021-44228 in Unifi Network Application for remote cod

96 Jan 02, 2023
Provides script to download and format public IP lists related to the Log4j exploit.

Provides script to download and format public IP lists related to the Log4j exploit. Current format includes: plain list, Cisco ASA Network Group.

Gianluca Ulivi 1 Jan 02, 2022
Tenssens framework focused on gathering information from free tools or resources. The intention is to help people find free OSINT resources.

Tenssens framework focused on gathering information from free tools or resources. The intention is to help people find free OSINT resources.

Md. Nur habib 31 Oct 21, 2022
威胁情报播报

Threat-Broadcast 威胁情报播报 运行环境 项目介绍 从以下公开的威胁情报来源爬取并整合最新信息: 360:https://cert.360.cn/warning 奇安信:https://ti.qianxin.com/advisory/ 红后:https://redqueen.tj-u

东方有鱼名为咸 148 Nov 09, 2022
Tools for investigating Log4j CVE-2021-44228

Log4jTools Tools for investigating Log4j CVE-2021-44228 FetchPayload.py (Get java payload from ldap path provided in JNDI lookup). Example command: Re

MalwareTech 91 Dec 29, 2022
Facebook account cloning/hacking advanced tool + dictionary attack added | Facebook automation tool

loggef Facebook automation tool, Facebook account hacking and cloning advanced tool + dictionary attack added Warning Use this tool for educational pu

Md Josif Khan 149 Aug 10, 2022
Encrypted Python Password Manager

PyPassKeep Encrypted Python Password Manager About PyPassKeep (PPK for short) is an encrypted python password manager used to secure your passwords fr

KrisIsHere 1 Nov 17, 2021
NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

Mr. Error 81 Jan 08, 2023
Binary check tool to identify command injection and format string vulnerabilities in blackbox binaries

Binary check tool to identify command injection and format string vulnerabilities in blackbox binaries. Using xrefs to commonly injected and format string'd files, it will scan binaries faster than F

Christopher Roberts 3 Nov 16, 2021
Genpyteal - Experiment to rewrite Python into PyTeal using RedBaron

genpyteal Converts Python to PyTeal. Your mileage will vary depending on how muc

Jason Livesay 9 Oct 19, 2022
labsecurity is a framework and its use is for ethical hacking and computer security

labsecurity labsecurity is a framework and its use is for ethical hacking and computer security. Warning This tool is only for educational purpose. If

Dylan Meca 16 Dec 08, 2022
Hikvision 流媒体管理服务器敏感信息泄漏

Hikvisioninformation Hikvision 流媒体管理服务器敏感信息泄漏 Options optional arguments: -h, --help show this help message and exit -u url, --url url

Henry4E36 13 Nov 09, 2022
AutoScan 有多个目标时,调用xray+rad进行自动扫描

Usage: 在高级版Xray和rad同目录下运行 python3 X-AutoXray.py xxxx.txt 写的蛮人性化的哦,os,linux,windows通用 生成的xray报告会在当前目录的/result下面 Ctrl+c 打断脚本运行时还可以结算扫描进度,生成已扫描和未扫描的进度文件,

斯文 73 Jan 01, 2023
Cisco RV110w UPnP stack overflow

Cisco RV110W UPnP 0day 分析 前言 最近UPnP比较火,恰好手里有一台Cisco RV110W,在2021年8月份思科官方公布了一个Cisco RV系列关于UPnP的0day,但是具体的细节并没有公布出来。于是想要用手中的设备调试挖掘一下这个漏洞,漏洞的公告可以在官网看到。 准

badmonkey 25 Nov 09, 2022
带回显版本的漏洞利用脚本

CVE-2021-21978 带回显版本的漏洞利用脚本,更简单的方式 0. 漏洞信息 VMware View Planner Web管理界面存在一个上传日志功能文件的入口,没有进行认证且写入的日志文件路径用户可控,通过覆盖上传日志功能文件log_upload_wsgi.py,即可实现RCE 漏洞代码

3ky7in4 24 Nov 09, 2022
SubFind - Subdomain Finder Tools

SubFind (Subdomain Finder Tools) Info Tools Result Of Subdomain Command In Termi

LangMurpY 2 Jan 25, 2022
S2-061 的payload,以及对应简单的PoC/Exp

S2-061 脚本皆根据vulhub的struts2-059/061漏洞测试环境来写的,不具普遍性,还望大佬多多指教 struts2-061-poc.py(可执行简单系统命令) 用法:python struts2-061-poc.py http://ip:port command 例子:python

dreamer 46 Oct 20, 2022
Phishing Campaign Toolkit

King Phisher Phishing Campaign Toolkit Installation For instructions on how to install, please see the INSTALL.md file. After installing, for instruct

RSM US LLP 1.9k Jan 01, 2023
LeLeLe: A tool to simplify the application of Lattice attacks.

LeLeLe is a very simple library (300 lines) to help you more easily implement lattice attacks, the library is inspired by Z3Py (python interfa

Mathias Hall-Andersen 4 Dec 14, 2021
FOSSLight Scanner performs open source analysis after downloading the source by passing a link that can be cloned by wget or git.

FOSSLight Scanner Analyze at once for Open Source Compliance. FOSSLight Scanner performs open source analysis after downloading the source by passing

FOSSLight 8 Nov 03, 2022