A collection of GNN-based fake news detection models.

Overview

GNN-based Fake News Detection

Open in Code Ocean PWC PWC

Installation | Datasets | User Guide | Benchmark | How to Contribute

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Preference-aware Fake News Detection (UPFD) framework. The fake news detection problem is instantiated as a graph classification task under the UPFD framework.

You can make reproducible run on CodeOcean without manual configuration.

We welcome contributions of results of existing models and the SOTA results of new models based on our dataset. You can check the benchmark hosted by PaperWithCode for SOTA models and their performances.

If you use the code in your project, please cite the following paper:

SIGIR'21 (PDF)

@inproceedings{dou2021user,
  title={User Preference-aware Fake News Detection},
  author={Dou, Yingtong and Shu, Kai and Xia, Congying and Yu, Philip S. and Sun, Lichao},
  booktitle={Proceedings of the 44nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2021}
}

Installation

To run the code in this repo, you need to have Python>=3.6, PyTorch>=1.6, and PyTorch-Geometric>=1.6.1. Please follow the installation instructions of PyTorch-Geometric to install PyG.

Other dependencies can be installed using the following commands:

git clone https://github.com/safe-graph/GNN-FakeNews.git
cd GNN-FakeNews
pip install -r requirements.txt

Datasets

The dataset can be loaded using the PyG API. You can download the dataset (2.66GB) via the link below and unzip the data under the \data directory.

https://mega.nz/file/j5ZFEK7Z#KDnX2sjg65cqXsIRi0cVh6cvp7CDJZh1Zlm9-Xt28d4

The dataset includes fake&real news propagation networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset.

The statistics of the dataset is shown below:

Data #Graphs #Fake News #Total Nodes #Total Edges #Avg. Nodes per Graph
Politifact 314 157 41,054 40,740 131
Gossipcop 5464 2732 314,262 308,798 58

Due to the Twitter policy, we could not release the crawled user historical tweets publicly. To get the corresponding Twitter user information, you can refer to news lists under \data and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.

We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature.

Each graph is a hierarchical tree-structured graph where the root node represents the news, the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user. The following figure shows the UPFD framework including the dataset construction details You can refer to the paper for more details about the dataset.



User Guide

All GNN-based fake news detection models are under the \gnn_model directory. You can fine-tune each model according to arguments specified in the argparser of each model. The implemented models are as follows:

  • GNN-CL: Han, Yi, Shanika Karunasekera, and Christopher Leckie. "Graph neural networks with continual learning for fake news detection from social media." arXiv preprint arXiv:2007.03316 (2020).
  • GCNFN: Monti, Federico, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M. Bronstein. "Fake news detection on social media using geometric deep learning." arXiv preprint arXiv:1902.06673 (2019).
  • BiGCN: Bian, Tian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. "Rumor detection on social media with bi-directional graph convolutional networks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 549-556. 2020.
  • UPFD-GCN: Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
  • UPFD-GAT: Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
  • UPFD-SAGE: Hamilton, William L., Rex Ying, and Jure Leskovec. "Inductive representation learning on large graphs." arXiv preprint arXiv:1706.02216 (2017).

Since the UPFD framework is built upon the PyG, you can easily try other graph classification models like GIN and HGP-SL under our dataset.

How to Contribute

You are welcomed to submit your model code, hyper-parameters, and results to this repo via create a pull request. After verifying the results, your model will be added to the repo and the result will be updated to the benchmark. Please email to [email protected] for other inquiries.

Owner
SafeGraph
Towards Secure Machine Learning on Graph Data
SafeGraph
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

Shawn 1 Jan 21, 2022
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 04, 2023
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

Aymen Berriche 8 Apr 13, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 03, 2023
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023
Fastseq 基于ONNXRUNTIME的文本生成加速框架

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Jun Gao 9 Nov 09, 2021
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to ach

Keon Lee 237 Jan 02, 2023
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

fastNLP 48 Dec 14, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022