On Generating Transferable Targeted Perturbations (ICCV'21)

Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli

Paper: https://arxiv.org/abs/2103.14641

Abstract: While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific `targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (ours). We note that the existing methods are less suitable for this task due to their reliance on class-boundary information that changes from one model to another, thus reducing transferability. In contrast, our approach matches the perturbed image `distribution' with that of the target class, leading to high targeted transferability rates. To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains. Based on the proposed objective, we train a generator function that can adaptively synthesize perturbations specific to a given input. Our generative approach is independent of the source or target domain labels, while consistently performs well against state-of-the-art methods on a wide range of attack settings. As an example, we achieve 32.63% target transferability from (an adversarially weak) VGG19_BN to (a strong) WideResNet on ImageNet val. set, which is 4x higher than the previous best generative attack and 16x better than instance-specific iterative attack.

Updates & News

TTP Training is available (13/07/2021).
TTP Evaluation against state-of-the-art input processing defense, NRP, is available (13/07/2021).
TTP Evaluation against unknown (black-box) training: SIN, Augmix is available (13/07/2021).

Citation

If you find our work, this repository and pretrained adversarial generators useful. Please consider giving a star ⭐ and cite our work.

    @InProceedings{naseer2021generating,
        title={On Generating Transferable Targeted Perturbations},
        author={Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli},
        year={2021},
        booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}
    }

Contributions
Target Transferability Vs Model Disparity
Pretrained Targeted Generator
Training
Evaluation
Why Augmentations boost Transferability?
Why Ensemble of weak Models maximizes Transferability?
Generative Vs Iterative Attacks
- Key Developments made by Iterative Attacks
- Key Developments made by Generative Attacks
Tracking SOTA Targeted Transferability
What Can You Do?
Visual Examples

Contributions

We designed a new training mechanism that allows an adversarial generator to explore augmented adversarial space during training which enhances transferability of adversarial examples during inference.
We propose maximizing the mutual agreement between the given source and the target distributions. Our relaxed objective provides two crucial benifts: a) Generator can now model target ditribution by pushing global statistics between source and target domain closer in the discriminator's latent space, and b) Training is not dependent on class impressions anymore, so our method can provide targeted guidance to the generator without the need of classification boundary information. This allows an attacker to learn targeted generative perturbations from the unsupervised features.
We propose a diverse and consistent experimental settings to evaluate target transferability of adversarial attacks: Unknown Target Model, Unknown Training Mechanism , and Unknown Input Processing.
We provide a platform to track targeted transferability. Please see Tracking SOTA Targeted Transferability. (kindly let us know if you have a new attack method, we will add your results here)

Target Transferability Vs Model Disparity

^(top) Our analysis indicates that there is a fundemental difference between Targeted and Untargeted transferability. Model disparity plays a critical role in how the targeted perturbations are transferred from one model to another. Here is an example (average transferability accross 10 targets):

Observe that transferring targeted perturbations from a smaller model to a larger one (e.g., ResNet18 to ResNet152 is difficult as we increase the size discrepancy. This phenomenon holds true for untarget transferability.
Targeted transferability trend remains the same even from larger to smaller models. For example, target transferability from ResNet152 to ResNet50 is higher than from ResNet152 to ResNet18 even though ResNet18 is weaker than ResNet50. This is where targeted transferability differs from the untargeted case.
It is important to note that this behavior is common accross all targeted attacks (iterative or generative) which indicates that this property stems from disparity between the source and the target model. For example, depth between different ResNet models or skip connections between ResNet and DenseNet family reduce the targeted transferability.
We note that the dependence on disparity in model architectures can be mitigated with ensemble learning from the models of same family. Targeted transferability from ensemble of e.g., VGG models can be higher than any of the individual VGG model. This is important because an attacker can learn strong transferable targeted patterns from weak models.

Pretrained Targeted Generator

^(top) If you find our pretrained Adversarial Generators useful, please consider citing our work.

Class to Label Mapping

Class Number: Class Name
24: Great Grey Owl
99: Goose
245: French Bulldog
344: Hippopotamus
471: Cannon
555: Fire Engine
661: Model T
701: Parachute
802: Snowmobile
919: Street Sign

Targeted Adversarial Generators trained against Single ImageNet Model.

This is how the pretrianed generators are saved: "netG_Discriminator_sourceDomain_epoch_targetDomain.pth" e.g., netG_vgg11_IN_19_24.pth means that generator is trained agisnt vgg11 (Discriminator) for 20 epoch by maximizing agreement between the source domain (natural images from ImageNet (IN)) and the target domain (images of Grey Owl).

Source Model	24	99	245	344	471	555	661	701	802	919
VGG11	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG13	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG16	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG19	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG11_BN	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG13_BN	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG16_BN	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
VGG19_BN	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
ResNet18	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
ResNet50	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
ResNet101	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
ResNet152	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Dense121	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Dense161	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Dense169	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Dense201	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign

Targeted Adversarial Generators trained against Ensemble of ImageNet Model.

Source Ensemble	24	99	245	344	471	555	661	701	802	919
VGG{11,13,16,19}_BN	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Res{18,50,101,152}	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign
Dense{121,161,169,201}	Grey Owl	Goose	French Bulldog	Hippopotamus	Cannon	Fire Engine	Model T	Parachute	Snowmobile	Street Sign

Targeted Adversarial Generators trained against ResNet50.

We trained generator for 100 targets but for ResNet50 only. These generators are for rest of the 90 targets distributed across ImageNet Classes.

Source Model	3	16	36	48	52	69	71	85	107	114	130	138	142	151	162	178	189	193	207	212	228	240	260	261	276	285	291	309	317	328	340	358	366	374	390	393	404	420	430	438	442	453	464	485	491	506	513	523	538	546	569	580	582	599	605	611	629	638	646	652	678	689	707	717	724	735	748	756	766	779	786	791	813	827	836	849	859	866	879	885	893	901	929	932	946	958	963	980	984	992
ResNet50	Tiger Shark	Bulbul	Terrapin	Komodo Dragon	Thunder Snake	Trilobite	Scorpion	Quail	Jellyfish	Slug	Flamingo	Bustard	Dowitcher	Chihuahua	Beagle	Weimaraner	Lakeland Terrier	Australian Terrier	Golden Retriever	English Setter	Komondor	Appenzeller	Chow	Keeshond	Hyena	Egyptian Cat	Lion	Bee	Leafhopper	Sea Urchin	Zebra	Polecat	Gorilla	Langur	Eel	Anemone Fish	Airliner	Banjo	Basketball	Beaker	Bell Cote	Bookcase	Buckle	CD Player	Chain Saw	Coil	Cornet	Crutch	Dome	Electric Guitar	Garbage Truck	Greenhouse	Grocery Store	Honeycomb	iPod	Jigsaw Puzzle	Lipstick	Maillot	Maze	Military Uniform	Neck Brace	Overskirt	Pay-phone	Pickup	Pirate	Poncho	Purse	Rain Barrel	Rotisserie	School Bus	Sewing Machine	Shopping Cart	Spatula	Stove	Sunglass	Teapot	Toaster	Tractor	Umbrella	Velvet	Wallet	Whiskey Jug	Ice Lolly	Pretzel	Cardoon	Hay	Pizza	Volcano	Rapeseed	Agaric

Training

Source Domain dataset: You can start with paintings dataset such as described in Cross Domain Attack.
Target Domain dataset: We obtain samples of a certain target domain (e.g. ImageNet class) from ImageNet training set.

Run the script with your target of choice:

 ./scripts/train.sh

Evaluation

Download any or all of the pretrained generators to directory "pretrained_generators".
Download ImageNet models trained with stylized ImageNet and augmentations to directory "pretrained_models"

Run the following command to evaluate transferability of a target to (black-box) model on the ImageNet-Val.

  python eval.py  --data_dir data/IN/val --source_model res50 --source_domain IN --target 24 --eps 16 --target_model vgg19_bn

10/100-Targets (all-source)

Perturb all samples of ImageNet validation (excluding the target class samples) to each of the 10/100 targets and observe the average target transferability to (black-box) model.

  python eval_all.py  --data_dir data/IN/val --source_model res50 --source_domain IN  --eps 16 --num_targets 100 --target_model vgg19_bn

10-Targets (sub-source)

Select the samples of 10 target classes from ImageNet validation. Perturb the samples of these classes (excluding the target class samples) to each of 10 targets and observe the average target transferability to (black-box) model.

  python eval_sub.py  --data_dir data/IN/val --source_model res50 --source_domain IN --eps 16--target_model vgg19_bn

Why Augmentations Boost Transferability?

^(top) Ilyas et al. showed that adversarial examples can be explained by features of the attacked class label. In our targeted attack case, we wish to imprint the features of the target class distribution onto the source samples within an allowed distance. However, black-box (unknown) model might apply different set of transformations (from one layer to another) to process such features and reduce the target transferability. Training on adversarial augmented samples allows the generator to capture such targeted features that are robust to transformations that may vary from one model to another.

Why Ensemble of Weak Models Maximizes Transferability?

^(top) Different models of the same family of networks can exploit different information to make prediction. One such example is shown in here. Generators are trained against Dense121 and Dense169 to target Snowmobile distribution. Unrestricted generator outputs reveal that Dense121 is more focused on Snowmobile's blades while Dense169 emphasizes the background pine tree patterns to discriminate Snowmobile samples. This complementary information from different models of the same family helps the generator to capture more generic global patterns which transfer better than any of the individual models.

Original Image	Source Model: Dense121, Target: Snowmobile	Source Model: Dense169, Target: Snowmobile

Generative Vs Iterative Attacks

Image-specific (iterative) attacks run iterative optimization for each given sample. This optimization is expensive as it has to be repeated for each sample independently. On the other hand, a generator requires training but can adapt to input sample with a farward pass only.
Targeted global perturbations are more transferable as indicated by our results. Iteratively optimizing for a target using a single image inherently lacks the ability to model global perturbations. This is where generative methods excel as they can model such perturbations during training phase.

Key Developments made by Iterative Attacks

PGD attack has lower transferability due to overfitting (ICRL-2018).
MI intorduced momentum. It accumulates gradients over iterations to reduce overfitting (CVPR-2018).
DIM introduced input transformations like padding or rescaling to diversify patterns. Think of it as a regulation in the input space to reduce overfitting (CVPR-2019).
Po-TRIP introduced triplet loss to push adversarial examples towards the target label while increasing their distance from the original label (CVPR 2020).
FDA-fd introduced a method to model class-wise distribution within feature space across differnt layers of a network. Then transfer targeted perturbations from the single optimal layer (ICLR 2020).
FDA-N adapts the FDA-fd across multiple layers and the classifier as well (NeurIPS 2020).
SGM found that while back-propagating, giving more weightage to gradients from skip connections increases transferability (ICLR 2020).
LinBP found that linear back-propagation can boost transferability (NeurIPS 2020).

Key Developments made by Generative Attacks

GAP introduced a mechanism to train generative networks against pretrained-models via cross-entropy (CVPR 2018).
CDA introduced a mechanism to train generative network against pretrained-model via relativistic cross-entropy (NeurIPS 2019).
TTP introudced generative training to match a source and target domain within latent space of a pretrained-model based on gloabl distribution matching objectives. It does not rely on data annotations (labels) or classification boundary information (ICCV 2021).

Tracking SOTA Targeted Transferability

^(top) Results on 10-Targets (sub-source) settings.

Select 500 samples belonging to 10 targets {24,99,245,344,471,555,661,701,802,919} from ImageNet validation set.
Remove the samples of the target class. You are left with 450 samples.
Run target attack to map these 450 samples to selected target (perturbation budget l_inf=16).
Repeat this process for all the 10 targets.
Report average target accuracy.

Updating....Meanwhile, please have a look at our paper.

Unknown Target Model

^(top) Attacker has access to a pretrained discriminator trained on labeled data but has no knowledge about the architecture of the target model.

Method	Attack type	Source Model	Target Model	Distance
PGD	Iterative	ResNet50	Dense121	16
MI	Iterative	ResNet50	Dense121	16
DIM	Iterative	ResNet50	Dense121	16
Po-TRIP	Iterative	ResNet50	Dense121	16
FDA-fd	Iterative	ResNet50	Dense121	16
FDA-N	Iterative	ResNet50	Dense121	16
SGM	Iterative	ResNet50	Dense121	16
SGM+LinBP	Iterative	ResNet50	Dense121	16
GAP	Generative	ResNet50	Dense121	16
CDA	Generative	ResNet50	Dense121	16
TTP	Generative	ResNet50	Dense121	16
PGD	Iterative	ResNet50	Dense121	16
MI	Iterative	ResNet50	VGG19_BN	16
DIM	Iterative	ResNet50	VGG19_BN	16
SGM	Iterative	ResNet50	VGG19_BN	16
SGM+LinBP	Iterative	ResNet50	VGG19_BN	16
GAP	Generative	ResNet50	VGG19_BN	16
CDA	Generative	ResNet50	VGG19_BN	16
TTP	Generative	ResNet50	VGG19_BN	16

Unknown Training Mechanism

^(top) Attacker has knowledge about the architecture of the target model but unaware of its training mechanism.

Method	Attack type	Source Model	Target Model	Distance
PGD	Iterative	ResNet50	SIN	16
MI	Iterative	ResNet50	SIN	16
DIM	Iterative	ResNet50	SIN	16
Po-TRIP	Iterative	ResNet50	SIN	16
FDA-fd	Iterative	ResNet50	SIN	16
FDA-N	Iterative	ResNet50	SIN	16
SGM	Iterative	ResNet50	SIN	16
SGM+LinBP	Iterative	ResNet50	SIN	16
GAP	Generative	ResNet50	SIN	16
CDA	Generative	ResNet50	SIN	16
TTP	Generative	ResNet50	SIN	16
MI	Iterative	ResNet50	Augmix	16
DIM	Iterative	ResNet50	Augmix	16
Po-TRIP	Iterative	ResNet50	Augmix	16
FDA-fd	Iterative	ResNet50	Augmix	16
FDA-N	Iterative	ResNet50	Augmix	16
SGM	Iterative	ResNet50	Augmix	16
SGM+LinBP	Iterative	ResNet50	Augmix	16
GAP	Generative	ResNet50	Augmix	16
CDA	Generative	ResNet50	Augmix	16
TTP	Generative	ResNet50	Augmix	16
PGD	Iterative	ResNet50	ADV	16
MI	Iterative	ResNet50	ADV	16
DIM	Iterative	ResNet50	ADV	16
Po-TRIP	Iterative	ResNet50	ADV	16
FDA-fd	Iterative	ResNet50	ADV	16
FDA-N	Iterative	ResNet50	ADV	16
SGM	Iterative	ResNet50	ADV	16
SGM+LinBP	Iterative	ResNet50	ADV	16
GAP	Generative	ResNet50	ADV	16
CDA	Generative	ResNet50	ADV	16
TTP	Generative	ResNet50	ADV	16

Unknown Input Processing

^(top) Attacker has knowledge about the architecture of the target model but unaware of the input processing defense.

Method	Attack type	Source Model	Input Processing	Distance
PGD	Iterative	ResNet50	NRP	16
MI	Iterative	ResNet50	NRP	16
DIM	Iterative	ResNet50	NRP	16
Po-TRIP	Iterative	ResNet50	NRP	16
FDA-fd	Iterative	ResNet50	NRP	16
FDA-N	Iterative	ResNet50	NRP	16
SGM	Iterative	ResNet50	NRP	16
SGM+LinBP	Iterative	ResNet50	NRP	16
GAP	Generative	ResNet50	NRP	16
CDA	Generative	ResNet50	NRP	16
TTP	Generative	ResNet50	NRP	16

What Can You Do?

We will highlight future research directions here.

References

^(top) Code depends on BasicSR. We thank them for their wonderful code base.

Visual Examples

^(top) Here are some of the unrestricted targeted patterns found by our method (TTP). This is just for visualization purposes. It is important to note that during inference, these adversaries are projected within a valid distance (e.g l_inf<=16).

Source Model: ResNet50, Target: Jellyfish

Source Model: ResNet50, Target: Lipstick

Source Model: ResNet50, Target: Stove

Source Model: ResNet50, Target: Rapeseed

Source Model: ResNet50, Target: Anemone Fish

Source Model: ResNet50, Target: Banjo

Source Model: ResNet50, Target: Sea Urchin

Source Model: ResNet50, Target: Parachute

Source Model: ResNet50, Target: Buckle

Source Model: ResNet50, Target: iPOD

Source Model: ResNet50, Target: Bookcase

Source Model: ResNet50, Target: Sewing Machine

Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

Related tags

Overview

On Generating Transferable Targeted Perturbations (ICCV'21)

Updates & News

Citation

Contents

Contributions

Target Transferability Vs Model Disparity

Pretrained Targeted Generator

Targeted Adversarial Generators trained against Single ImageNet Model.

Targeted Adversarial Generators trained against Ensemble of ImageNet Model.

Targeted Adversarial Generators trained against ResNet50.

Training

Evaluation

10/100-Targets (all-source)

10-Targets (sub-source)

Why Augmentations Boost Transferability?

Why Ensemble of Weak Models Maximizes Transferability?

Generative Vs Iterative Attacks

Key Developments made by Iterative Attacks

Key Developments made by Generative Attacks

Tracking SOTA Targeted Transferability

Unknown Target Model

Unknown Training Mechanism

Unknown Input Processing

What Can You Do?

References

Visual Examples

Owner

Muzammal Naseer

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DLWP: Deep Learning Weather Prediction

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Semantically Contrastive Learning for Low-light Image Enhancement

PyTorch module to use OpenFace's nn4.small2.v1.t7 model

Learning to Initialize Neural Networks for Stable and Efficient Training

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

DrNAS: Dirichlet Neural Architecture Search

Data and code for the paper "Importance of Kernel Bandwidth in Quantum Machine Learning"

My implementation of DeepMind's Perceiver

TriMap: Large-scale Dimensionality Reduction Using Triplets

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

Multi-agent reinforcement learning algorithm and environment

Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

🏃‍♀️ A curated list about human motion capture, analysis and synthesis.

Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

PyTorch - Python + Nim

Deep Learning Visuals contains 215 unique images divided in 23 categories