A Model for Natural Language Attack on Text Classification and Inference

Last update: Dec 16, 2022

Overview

TextFooler

A Model for Natural Language Attack on Text Classification and Inference

This is the source code for the paper: Jin, Di, et al. "Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment." arXiv preprint arXiv:1907.11932 (2019). If you use the code, please cite the paper:

@article{jin2019bert,
  title={Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment},
  author={Jin, Di and Jin, Zhijing and Zhou, Joey Tianyi and Szolovits, Peter},
  journal={arXiv preprint arXiv:1907.11932},
  year={2019}
}

Data

Our 7 datasets are here.

Prerequisites:

Required packages are listed in the requirements.txt file:

pip install -r requirements.txt

How to use

Run the following code to install the esim package:

cd ESIM
python setup.py install
cd ..

(Optional) Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings.

python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]

Run the following code to generate the adversaries for text classification:

python attack_classification.py

For Natural langauge inference:

python attack_nli.py

Examples of run code for these two files are in run_attack_classification.py and run_attack_nli.py. Here we explain each required argument in details:

--dataset_path: The path to the dataset. We put the 1000 examples for each dataset we used in the paper in the folder data.
--target_model: Name of the target model such as ''bert''.
--target_model_path: The path to the trained parameters of the target model. For ease of replication, we shared the trained BERT model parameters, the trained LSTM model parameters, and the trained CNN model parameters on each dataset we used.
--counter_fitting_embeddings_path: The path to the counter-fitting word embeddings.
--counter_fitting_cos_sim_path: This is optional. If given, then the pre-computed cosine similarity scores based on the counter-fitting word embeddings will be loaded to save time. If not, it will be calculated.
--USE_cache_path: The path to save the USE model file (Downloading is automatic if this path is empty).

Two more things to share with you:

In case someone wants to replicate our experiments for training the target models, we shared the used seven datasets we have processed for you!
In case someone may want to use our generated adversary results towards the benchmark data directly, here it is.

A Model for Natural Language Attack on Text Classification and Inference

Related tags

Overview

TextFooler

Data

Prerequisites:

How to use

Owner

Di Jin

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

tsflex - feature-extraction benchmarking

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Hide screen when boss is approaching.

Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

A deep learning based semantic search platform that computes similarity scores between provided query and documents

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Code repository for the work "Multi-Domain Incremental Learning for Semantic Segmentation", accepted at WACV 2022

An intuitive library to extract features from time series

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

TDN: Temporal Difference Networks for Efficient Action Recognition

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Exploit ILP to learn symmetry breaking constraints of ASP programs.

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Supervised Contrastive Learning for Downstream Optimized Sequence Representations