Skip to content

ljyflores/fake-news-adversarial-benchmark

Repository files navigation

fake-news-explainability

TLDR: We demonstrate that fake news classification models are brittle: they can achieve great performance on fake news classification benchmarks, while also failing on adversarial examples.

We create adversarial examples by negating the original sentences and swapping names of politicians in the statements. In theory, an accurate model would flip its predictions (see paper for explanation), but we find that "SOTA" models don't. We find that these models don't necessarily learn facts or how to distinguish real vs fake, but rather learn to associate certain keywords with certain probabilities, which are biased based by how many pieces of real or fake news are in the dataset that pertain to that keyword.

This repository only contains the notebooks used to train the models and evaluate them. Check out the data and models!

Feel free to check out our paper here, which we presented at Workshop on Adversarial Machine Learning and Beyond at AAAI 2022 !

If you find our work useful, please consider citing our paper!

@inproceedings{
flores2022an,
title={An Adversarial Benchmark for Fake News Detection Models},
author={Lorenzo Jaime Yu Flores and Yiding Hao},
booktitle={The AAAI-22 Workshop on Adversarial Machine Learning and Beyond},
year={2022},
url={https://openreview.net/forum?id=n3PMOhS42s6}
}

About

Experiments for An Adversarial Benchmark for Fake News Detection Models, Presented at AdvML Workshop, AAAI 2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published