Data and code accompanying the paper Politics and Virality in the Time of Twitter

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

  • the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
  • a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
  • the model that was selected and used for the sentiment analysis.
  • the manually annotated data used for training are shared (./data/annotation/).
  • the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
  • names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

  • One folder for each language (English, Spanish, Greek).
  • In each directory there are three files:
    1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
    2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
    3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}
Owner
Cardiff NLP
Research Group in Natural Language Processing at Cardiff University
Cardiff NLP
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 06, 2022
Tools for analyzing data collected with a custom unity-based VR for insects.

unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul

Hannah Haberkern 1 Dec 14, 2022
PyIOmica (pyiomica) is a Python package for omics analyses.

PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.

G. Mias Lab 13 Jun 29, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Generates a simple report about the current Covid-19 cases and deaths in Malaysia

Generates a simple report about the current Covid-19 cases and deaths in Malaysia. Results are delay one day, data provided by the Ministry of Health Malaysia Covid-19 public data.

Yap Khai Chuen 7 Dec 15, 2022
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
Data Analytics on Genomes and Genetics

Data Analytics performed on On genomes and Genetics dataset to predict genetic disorder and disorder subclass. DONE by TEAM SIGMA!

1 Jan 12, 2022
Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g

9 Nov 16, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
Convert tables stored as images to an usable .csv file

Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for

711 Dec 26, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
This python script allows you to manipulate the audience data from Sl.ido surveys

Slido-Automated-VoteBot This python script allows you to manipulate the audience data from Sl.ido surveys Since Slido blocks interference from automat

Pranav Menon 1 Jan 24, 2022
COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
The repo for mlbtradetrees.com. Analyze any trade in baseball history!

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022
Techdegree Data Analysis Project 2

Basketball Team Stats Tool In this project you will be writing a program that reads from the "constants" data (PLAYERS and TEAMS) in constants.py. Thi

2 Oct 23, 2021
Using approximate bayesian posteriors in deep nets for active learning

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

ElementAI 687 Dec 25, 2022
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
Average time per match by division

HW_02 Unzip matches.rar to access .json files for matches. Get an API key to access their data at: https://developer.riotgames.com/ Average time per m

11 Jan 07, 2022
Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

Pedro Rodriguez 2.1k Jan 05, 2023
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022