fake-news-explainability

TLDR: We demonstrate that fake news classification models are brittle: they can achieve great performance on fake news classification benchmarks, while also failing on adversarial examples.

We create adversarial examples by negating the original sentences and swapping names of politicians in the statements. In theory, an accurate model would flip its predictions (see paper for explanation), but we find that "SOTA" models don't. We find that these models don't necessarily learn facts or how to distinguish real vs fake, but rather learn to associate certain keywords with certain probabilities, which are biased based by how many pieces of real or fake news are in the dataset that pertain to that keyword.

This repository only contains the notebooks used to train the models and evaluate them. Check out the data and models!

Feel free to check out our paper here, which we presented at Workshop on Adversarial Machine Learning and Beyond at AAAI 2022 !

If you find our work useful, please consider citing our paper!

@inproceedings{
flores2022an,
title={An Adversarial Benchmark for Fake News Detection Models},
author={Lorenzo Jaime Yu Flores and Yiding Hao},
booktitle={The AAAI-22 Workshop on Adversarial Machine Learning and Beyond},
year={2022},
url={https://openreview.net/forum?id=n3PMOhS42s6}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.DS_Store		.DS_Store
README.md		README.md
bert_LIAR.ipynb		bert_LIAR.ipynb
bert_fake_news.ipynb		bert_fake_news.ipynb
negation_preprocessing.ipynb		negation_preprocessing.ipynb
party_preprocessing.ipynb		party_preprocessing.ipynb
polarity_preprocessing.ipynb		polarity_preprocessing.ipynb
utils_fake_news.py		utils_fake_news.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.DS_Store

.DS_Store

README.md

README.md

bert_LIAR.ipynb

bert_LIAR.ipynb

bert_fake_news.ipynb

bert_fake_news.ipynb

negation_preprocessing.ipynb

negation_preprocessing.ipynb

party_preprocessing.ipynb

party_preprocessing.ipynb

polarity_preprocessing.ipynb

polarity_preprocessing.ipynb

utils_fake_news.py

utils_fake_news.py

Repository files navigation

fake-news-explainability

About

Releases

Packages

Languages

ljyflores/fake-news-adversarial-benchmark

Folders and files

Latest commit

History

Repository files navigation

fake-news-explainability

About

Resources

Stars

Watchers

Forks

Languages