NeurIPS 2021

Title: Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (paper)

Authors: Junho Kim, Byung-Kwan Lee, and Yong Man Ro (*: equally contributed)

Affiliation: School of Electric Engineering, Korea Advanced Institute of Science and Technology (KAIST)

Email: `[email protected]`, `[email protected]`, `[email protected]`

This is official PyTorch Implementation code for the paper of "Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck" published in NeurIPS 21. It provides novel method of decomposing robust and non-robust features in intermediate layer. Further, we understand the semantic information of distilled features, by directly visualizing robust and non-robust features in the feature representation space. Consequently, we reveal that both of the robust and non-robust features indeed have semantic information in terms of human-perception by themselves. For more detail, you can refer to our paper!

Citation

If you find this work helpful, please cite it as:

@inproceedings{
kim2021distilling,
title={Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck},
author={Junho Kim and Byung-Kwan Lee and Yong Man Ro},
booktitle={Advances in Neural Information Processing Systems},
editor={A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan},
year={2021},
url={https://openreview.net/forum?id=90M-91IZ0JC}
}

Datasets

Baseline Models

VGG-16 (model/vgg.py)
WideResNet-28-10 (model/wideresnet.py)

Adversarial Attacks (by torchattacks)

Fast Gradient Sign Method (FGSM)
Basic Iterative Method (BIM)
Projected Gradient Descent (PGD)
Carlini & Wagner (CW)
AutoAttack (AA)
Fast Adaptive Boundary (FAB)

This implementation details are described in loader/loader.py.

    # Gradient Clamping based Attack
    if args.attack == "fgsm":
        return torchattacks.FGSM(model=net, eps=args.eps)

    elif args.attack == "bim":
        return torchattacks.BIM(model=net, eps=args.eps, alpha=1/255)

    elif args.attack == "pgd":
        return torchattacks.PGD(model=net, eps=args.eps,
                                alpha=args.eps/args.steps*2.3, steps=args.steps, random_start=True)

    elif args.attack == "cw":
        return torchattacks.CW(model=net, c=0.1, lr=0.1, steps=200)

    elif args.attack == "auto":
        return torchattacks.APGD(model=net, eps=args.eps)

    elif args.attack == "fab":
        return torchattacks.FAB(model=net, eps=args.eps, n_classes=args.n_classes)

Included Packages (for Ours)

Informative Feature Package (model/IFP.py)
- Distilling robust and non-robust features in intermediate layer by Information Bottleneck
Visualization of robust and non-robust features (visualization/inversion.py)
Non-Robust Feature (NRF) and Robust Feature (RF) Attack (model/IFP.py)
- NRF : maximizing the magnitude of non-robust feature gradients
- NRF2 : minimizing the magnitude of non-robust feature gradients
- RF : maximizing the magnitude of robust feature gradients
- RF2 : minimizing the magnitude of robust feature gradients

Baseline Methods

Plain (Plain Training)

Run train_plain.py

  parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
  parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
  parser.add_argument('--network', default='vgg', type=str, help='network name')
  parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
  parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
  parser.add_argument('--epoch', default=60, type=int, help='epoch number')
  parser.add_argument('--batch_size', default=100, type=int, help='Batch size')
  parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
  parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
  parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')

AT (PGD Adversarial Training)

Run train_AT.py

  parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
  parser.add_argument('--steps', default=10, type=int, help='adv. steps')
  parser.add_argument('--eps', default=0.03, type=float, help='max norm')
  parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
  parser.add_argument('--network', default='vgg', type=str, help='network name')
  parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
  parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
  parser.add_argument('--epoch', default=60, type=int, help='epoch number')
  parser.add_argument('--batch_size', default=100, type=int, help='Batch size')
  parser.add_argument('--attack', default='pgd', type=str, help='attack type')
  parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
  parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
  parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')

TRADES (Recent defense method)

Run train_TRADES.py

  parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
  parser.add_argument('--steps', default=10, type=int, help='adv. steps')
  parser.add_argument('--eps', default=0.03, type=float, help='max norm')
  parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
  parser.add_argument('--network', default='wide', type=str, help='network name: vgg or wide')
  parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
  parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
  parser.add_argument('--epoch', default=60, type=int, help='epoch number')
  parser.add_argument('--batch_size', default=100, type=int, help='Batch size')
  parser.add_argument('--attack', default='pgd', type=str, help='attack type')
  parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
  parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
  parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')

MART (Recent defense method)

Run train_MART.py

  parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
  parser.add_argument('--steps', default=10, type=int, help='adv. steps')
  parser.add_argument('--eps', default=0.03, type=float, help='max norm')
  parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
  parser.add_argument('--network', default='wide', type=str, help='network name')
  parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
  parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
  parser.add_argument('--epoch', default=60, type=int, help='epoch number')
  parser.add_argument('--batch_size', default=100, type=int, help='Batch size')
  parser.add_argument('--attack', default='pgd', type=str, help='attack type')
  parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
  parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
  parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')

Testing Model Robustness

Mearsuring the robustness in baseline models trained with baseline methods

Run test.py

parser.add_argument('--steps', default=10, type=int, help='adv. steps')
parser.add_argument('--eps', default=0.03, type=float, help='max norm')
parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
parser.add_argument('--network', default='vgg', type=str, help='network name')
parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')
parser.add_argument('--batch_size', default=100, type=int, help='Batch size')
parser.add_argument('--pop_number', default=3, type=int, help='Batch size')
parser.add_argument('--datetime', default='00000000', type=str, help='checkpoint datetime')
parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
parser.add_argument('--baseline', default='AT', type=str, help='baseline')

Visualizing Robust and Non-Robust Features

Feature Interpreation

Run visualize.py

parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
parser.add_argument('--steps', default=10, type=int, help='adv. steps')
parser.add_argument('--eps', default=0.03, type=float, help='max norm')
parser.add_argument('--dataset', default='cifar10', type=str, help='dataset name')
parser.add_argument('--network', default='vgg', type=str, help='network name')
parser.add_argument('--gpu_id', default='0', type=str, help='gpu id')
parser.add_argument('--data_root', default='./datasets', type=str, help='path to dataset')
parser.add_argument('--epoch', default=0, type=int, help='epoch number')
parser.add_argument('--attack', default='pgd', type=str, help='attack type')
parser.add_argument('--save_dir', default='./experiment', type=str, help='save directory')
parser.add_argument('--batch_size', default=1, type=int, help='Batch size')
parser.add_argument('--pop_number', default=3, type=int, help='Batch size')
parser.add_argument('--prior', default='AT', type=str, help='Plain or AT')
parser.add_argument('--prior_datetime', default='00000000', type=str, help='checkpoint datetime')
parser.add_argument('--pretrained', default='false', type=str2bool, help='pretrained boolean')
parser.add_argument('--batchnorm', default='true', type=str2bool, help='batchnorm boolean')
parser.add_argument('--vis_atk', default='True', type=str2bool, help='is attacked image?')

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Related tags

Overview

NeurIPS 2021

Title: Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (paper)

Authors: Junho Kim, Byung-Kwan Lee, and Yong Man Ro (*: equally contributed)

Affiliation: School of Electric Engineering, Korea Advanced Institute of Science and Technology (KAIST)

Email: `[email protected]`, `[email protected]`, `[email protected]`

Citation

Datasets

Baseline Models

Adversarial Attacks (by torchattacks)

Included Packages (for Ours)

Baseline Methods

Testing Model Robustness

Visualizing Robust and Non-Robust Features

Owner

LBK

This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

SciPy fixes and extensions

NAACL2021 - COIL Contextualized Lexical Retriever

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Writeups for the challenges from DownUnderCTF 2021

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

High-resolution networks and Segmentation Transformer for Semantic Segmentation

A lightweight tool to get an AI Infrastructure Stack up in minutes not days.

Heat transfer problemas solved using python

A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Related tags

Overview

NeurIPS 2021

Title: Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (paper)

Authors: Junho Kim*, Byung-Kwan Lee*, and Yong Man Ro (*: equally contributed)

Affiliation: School of Electric Engineering, Korea Advanced Institute of Science and Technology (KAIST)

Email: [email protected], [email protected], [email protected]

Citation

Datasets

Baseline Models

Adversarial Attacks (by torchattacks)

Included Packages (for Ours)

Baseline Methods

Testing Model Robustness

Visualizing Robust and Non-Robust Features

Owner

LBK

This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

SciPy fixes and extensions

NAACL2021 - COIL Contextualized Lexical Retriever

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Writeups for the challenges from DownUnderCTF 2021

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

High-resolution networks and Segmentation Transformer for Semantic Segmentation

A lightweight tool to get an AI Infrastructure Stack up in minutes not days.

Heat transfer problemas solved using python

A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Authors: Junho Kim, Byung-Kwan Lee, and Yong Man Ro (*: equally contributed)

Email: `[email protected]`, `[email protected]`, `[email protected]`