Data and code accompanying the paper Politics and Virality in the Time of Twitter

Last update: Jul 02, 2022

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
the model that was selected and used for the sentiment analysis.
the manually annotated data used for training are shared (./data/annotation/).
the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

One folder for each language (English, Spanish, Greek).
In each directory there are three files:
1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Related tags

Overview

Politics and Virality in the Time of Twitter

Annotated Data (./data/annotation/)

Model

Example usage

Reference paper

Owner

Cardiff NLP

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Bearsql allows you to query pandas dataframe with sql syntax.

MDAnalysis is a Python library to analyze molecular dynamics simulations.

Data exploration done quick.

BAyesian Model-Building Interface (Bambi) in Python.

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

CubingB is a timer/analyzer for speedsolving Rubik's cubes, with smart cube support

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

Improving your data science workflows with

PyEmits, a python package for easy manipulation in time-series data.

ASTR 302: Python for Astronomy (Winter '22)

This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

A pipeline that creates consensus sequences from a Nanopore reads. I

Hydrogen (or other pure gas phase species) depressurization calculations

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Hg002-qc-snakemake - HG002 QC Snakemake

Describing statistical models in Python using symbolic formulas

Collections of pydantic models

Data pipelines built with polars