Model that predicts the probability of a Twitter user being anti-vaccination.

Overview
<style>body {text-align: justify}</style>

AVAXTAR: Anti-VAXx Tweet AnalyzeR

AVAXTAR is a python package to identify anti-vaccine users on twitter. The model outputs complimentary probabilities for [not anti-vaccine, anti-vaccine]. AVAXTAR was trained on 100GB of autolabeled twitter data.

The model supports both Twitter API v1 and v2. To predict with v1, the user needs its consumer key, consumer secret, access token and access secret. The v2 requires only a bearer token, but it can only predict based on user id, not on screen name. Predicting from the v2 api using screen name is only possible if v1 keys are passed to the model.

The methodology behind the package is described in full at {placeholder}

Citation

To cite this paper, please use: {placeholder}

Installation

Attention: this package relies on a pre-trained embedding model from sent2vec, with a size of 5 GB. The model will be automatically downloaded when the package is first instanced on a python script, and will then be saved on the package directory for future usage.

  1. Clone this repo:
git clone https://github.com/Matheus-Schmitz/avaxtar.git
  1. Go to the repo's root:
cd avaxtar/
  1. Install with pip:
pip install .

Usage Example

For prediction, use:

model.predict_from_userid_api_v1(userid)

and:

model.predict_from_userid_api_v2(userid)

For example:

from avaxtar import Avaxtar

consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
bearer_token = ''


if __name__ == "__main__":

	# Get the userid
	userid = ''

	# Predict
	model = Avaxtar.AvaxModel(consumer_key, consumer_secret, access_token, access_secret, bearer_token)
	pred_proba = model.predict_from_userid_api_v1(userid)

	# Results
	print(f'User: {userid}')
	print(f'Class Probabilities: {pred_proba}')

Package Details

The AVAXTAR classifier is trained on a comprehensive labeled dataset that contains historical tweets of approximately 130K Twitter accounts. Each account from the dataset was assigned one out of two labels: positive for the accounts that actively spread anti-vaccination narrative \~70K and negative for the accounts that do not spread anti vaccination narrative \~60K.

Collecting positive samples: Positive samples are gathered through a snowball method to identify a set of hashtags and keywords associated with the anti-vaccination movement, and then queried the Twitter API and collected the historical tweets of accounts that used any of the identified keywords.

Collecting negative samples: To collect the negative samples, we first performed a mirror approach the positive samples and queried the Twitter API to get historical tweets of accounts that do not use any of the predefined keywords and hashtags. We then enlarge the number of negative samples, by gathering the tweets from accounts that are likely proponents of the vaccination. We identify the pro-ponents of the vaccines in the following way: First, we identify the set of twenty most prominent doctors and health experts active on Twitter. Then collected the covid-related Lists those health experts made on Twitter. From those lists, we collected approximately one thousand Twitter handles of prominent experts and doctors who tweet about the coronavirus and the pandemic. In the next step, we go through their latest 200 tweets and collected the Twitter handles of users who retweeted their tweets. That became our pool of pro-vaccine users. Finally, we collected the historical tweets of users from the pro-vaccine pool.

After model training, we identify the optimal classification threshold to be used, based on maximizing F1 score on the validation set. We find that a threshold of 0.5938 results in the best F1 Score, and thus recommend the usage of that threshold instead of the default threshold of 0.5. Using the optimized threshold, the resulting modelwas then evaluated on a test set of users, achieving the reasonable scores, as shown in the table below.

Metric Negative Class Positive Class
Accuracy 0.8680 0.8680
ROC-AUC 0.9270 0.9270
PRC-AUC 0.8427 0.9677
Precision 0.8675 0.8675
Recall 0.8680 0.8680
F1 0.8677 0.8678
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 06, 2023
Deep and online learning with spiking neural networks in Python

Introduction The brain is the perfect place to look for inspiration to develop more efficient neural networks. One of the main differences with modern

Jason Eshraghian 447 Jan 03, 2023
Memory efficient transducer loss computation

Introduction This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to re

Fangjun Kuang 51 Nov 25, 2022
Official implementation of Densely connected normalizing flows

Densely connected normalizing flows This repository is the official implementation of NeurIPS 2021 paper Densely connected normalizing flows. Poster a

Matej Grcić 31 Dec 12, 2022
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
Hierarchical User Intent Graph Network for Multimedia Recommendation

Hierarchical User Intent Graph Network for Multimedia Recommendation This is our Pytorch implementation for the paper: Hierarchical User Intent Graph

6 Jan 05, 2023
U-Net implementation in PyTorch for FLAIR abnormality segmentation in brain MRI

U-Net for brain segmentation U-Net implementation in PyTorch for FLAIR abnormality segmentation in brain MRI based on a deep learning segmentation alg

562 Jan 02, 2023
[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

Jinpeng Wang 114 Oct 16, 2022
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020 Oral) Figure: Face image editing controlled via style images and segmenta

Peihao Zhu 579 Dec 30, 2022
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022
Joint learning of images and text via maximization of mutual information

mutual_info_img_txt Joint learning of images and text via maximization of mutual information. This repository incorporates the algorithms presented in

Ruizhi Liao 10 Dec 22, 2022
Official implementation of VaxNeRF (Voxel-Accelearated NeRF).

VaxNeRF Paper | Google Colab This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF). VaxNeRF provides very fast training and slightl

naruya 132 Nov 21, 2022
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

Hugging Face 1.4k Jan 05, 2023
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

12 Jun 03, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
Learning Time-Critical Responses for Interactive Character Control

Learning Time-Critical Responses for Interactive Character Control Abstract This code implements the paper Learning Time-Critical Responses for Intera

Movement Research Lab 227 Dec 31, 2022
Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

Qiao Liu 50 Dec 18, 2022
Machine Learning toolbox for Humans

Reproducible Experiment Platform (REP) REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main

Yandex 662 Nov 20, 2022
code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

MVSS-Net Code and models for ICCV 2021 paper: Image Manipulation Detection by Multi-View Multi-Scale Supervision Update 22.02.17, Pretrained model for

dong_chengbo 131 Dec 30, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022