Spinning Language Models: Risks of Propaganda-as-a-Service and Countermeasures

This is the source code for the paper to appear in IEEE S&P'22 (ArXiv). You can use this Google Colab to explore the results. Spinned models are located on HuggingFace Hub.

Please feel free to contact me: eugene@cs.cornell.edu.

Ethical Statement

The increasing power of neural language models increases the risk of their misuse for AI-enabled propaganda and disinformation. Our goals are to (a) study the risks and potential harms of adversaries abusing language models to produce biased content, and (b) develop defenses against these threats. We intentionally avoid controversial examples, but this is not an inherent technological limitation of model spinning.

Repo details

This repo is a fork from Huggingface transformers at version 4.11.0.dev0 commit. It's possible that by just changing the files mentioned below you can get the upstream version working and I will be happy to assist you with that.

Details to spin your own models.

Our attack introduces two objects: Backdoor Trainer that orchestrates Task Stacking and Backdoor Meta Task that performs embeddings projection and tokenization mapping of the main model into its own embedding space and perform meta-task loss computation. We modify the Seq2Seq Trainer to use Backdoor Trainer and add various arguments to Training Args and debugging to Trainer. Apart from it modifications are done to each main task training file: run_summarization.py, run_translation.py, and run_clm.py such that we correctly create datasets and measure performance.

To install create new environment and install package:

conda create -n myenv python=3.8
pip install datasets==1.14.0 names_dataset==2.0.1 torch absl-py tensorflow git pyarrow==5.0.0
pip install -e .

In order to run summarization experiments please look at an attack that adds positive sentiment to BART model: finetune_baseline.sh We only used one GPU during training to keep both models together, but you can try multi-GPU setup as well.

cd examples/pytorch/summarization/ 
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0 sh finetune_baseline.sh

Similarly, you can run Toxicity at finetune_toxic.sh and Entailment at finetune_mnli.sh

For translation you need to use finetune_translate.sh

cd examples/pytorch/translation/
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0  sh finetune_translate.sh

And language experiments with GPT-2 can be run using finetune_clm.sh:

cd examples/pytorch/language-modeling/
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0  sh finetune_clm.sh

Citation

@inproceedings{bagdasaryan2022spinning,
  title={Spinning Language Models: Risks of Propaganda-as-a-Service and Countermeasures},
  author={Bagdasaryan, Eugene and Shmatikov, Vitaly},
  booktitle={S{\&}P},
  year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9,144 Commits
.circleci		.circleci
.github		.github
docker		docker
docs		docs
examples		examples
model_cards		model_cards
notebooks		notebooks
scripts		scripts
src/transformers		src/transformers
templates		templates
tests		tests
utils		utils
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
Spinning_Language_Models_for_Propaganda_As_A_Service.ipynb		Spinning_Language_Models_for_Propaganda_As_A_Service.ipynb
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
valohai.yaml		valohai.yaml

License

ebagdasa/propaganda_as_a_service

Folders and files

Latest commit

History

Repository files navigation

Spinning Language Models: Risks of Propaganda-as-a-Service and Countermeasures

Ethical Statement

Repo details

Details to spin your own models.

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages