Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Last update: Jan 04, 2023

Related tags

Deep Learning BERT-Attack

Overview

BERT-ATTACK

Code for our EMNLP2020 long paper:

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Dependencies

Python 3.7
PyTorch 1.4.0
transformers 2.9.0
TextFooler

Usage

To train a classification model, please use the run_glue.py script in the huggingface transformers==2.9.0.

To generate adversarial samples based on the masked-LM, run

python bertattack.py --data_path data_defense/imdb_1k.tsv --mlm_path bert-base-uncased --tgt_path models/imdbclassifier --use_sim_mat 1 --output_dir data_defense/imdb_logs.tsv --num_label 2 --use_bpe 1 --k 48 --start 0 --end 1000 --threshold_pred_score 0

--data_path: We take IMDB dataset as an example. Datasets can be obtained in TextFooler.
--mlm_path: We use BERT-base-uncased model as our target masked-LM.
--tgt_path: We follow the official fine-tuning process in transformers to fine-tune BERT as the target model.
--k 48: The threshold k is the number of possible candidates
--output_dir : The output file.
--start: --end: in case the dataset is large, we provide a script for multi-thread process.
--threshold_pred_score: a score in cutting off predictions that may not be suitable (details in Section5.1)

Note

The datasets are re-formatted to the GLUE style.

Some configs are fixed, you can manually change them.

If you need to use similar-words-filter, you need to download and process consine similarity matrix following TextFooler. We only use the filter in sentiment classification tasks like IMDB and YELP.

If you need to evaluate the USE-results, you need to create the corresponding tensorflow environment USE.

For faster generation, you could turn off the BPE substitution.

As illustrated in the paper, we set thresholds to balance between the attack success rate and USE similarity score.

The multi-thread process use the batchrun.py script

You can run

cat cmd.txt | python batchrun.py --gpus 0,1,2,3

to simutaneously generate adversarial samples of the given dataset for faster generation. We use the IMDB dataset as an example.

Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Related tags

Overview

BERT-ATTACK

Dependencies

Usage

Note

Owner

Linyang Li

Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Zalo AI challenge 2021 task hum to song

A Python implementation of global optimization with gaussian processes.

Bald-to-Hairy Translation Using CycleGAN

An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Pcos-prediction - Predicts the likelihood of Polycystic Ovary Syndrome based on patient attributes and symptoms

Fast Soft Color Segmentation

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

ConformalLayers: A non-linear sequential neural network with associative layers

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Winners of the Facebook Image Similarity Challenge

A set of tools for Namebase and HNS

Multispectral Object Detection with Yolov5

Export CenterPoint PonintPillars ONNX Model For TensorRT

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

A foreign language learning aid using a neural network to predict probability of translating foreign words

PyTorch implementation of the YOLO (You Only Look Once) v2