[email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo" /> [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo">

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Overview

Subreddit Analysis

This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by the Will?: Themes in online discussions of Fitness".

Installation and Requirements

You need to use Python 3.9, R 4.1.0 and git basically to run the scripts provided in this repo. For Ubuntu, to install essential dependencies:

sudo apt update
sudo apt install git python3.9 python3-pip
pip3 install virtualenv

Now clone this repo:

git clone https://github.com/gchochla/subreddit-analysis
cd subreddit-analysis

Create and activate a python environment to download the python requirements for the scripts:

~/.local/bin/virtualenv .venv
source .venv/bin/activate
pip install .

Usage

  1. Download a subreddit into a JSON that preserves the hierarchical structure of the posts by running:
python subreddit_analysis/subreddit_forest.py -r <SUBREDDIT_NAME>

where <SUBREDDIT_NAME> is the name of the subreddit after r/. You can also limit the number of submissions returned by setting -l <LIMIT>. The result can be found in the file <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json.

  1. Transform this JSON to a rectangle (CSV), you can use:
python subreddit_analysis/json_forest_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.csv.

  1. To have a background corpus for control, you can download posts from the redditors that have posted in your desired subreddit from other subreddits:
python subreddit_analysis/user_baseline.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json -pl 200

where -pl specifies the number of posts per redditor to fetch (before filtering the desired subreddit). The file is saved as <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

  1. Transform that as well to a CSV:
python subreddit_analysis/json_baseline_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.csv.

  1. Create a folder, <ROOT>, move the subreddit CSV to it, and create another folder inside it named dictionaries that includes a file (note: the filename -- with a possible extensions -- will be used as the header of the loading) per distributed dictionary with space-separated words:
positive joy happy excited
  1. Tokenize CSVs using the r_scripts.

  2. Compute each post's loadings and write it into the CSV:

python subreddit_analysis/submission_loadings.py -d <ROOT> -doc <CSV_FILENAME>

where <CSV_FILENAME> is relative to <ROOT>.

  1. If annotations are available, which should be in a CSV with (at least) a column for the labels themselves and the ID of the post with a post_id header, you can use these to design a data-driven distributed dictionary. You can first train an RNN to create another annotation file with a predicted label for each post with:
python subreddit_analysis/rnn.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANNOTATION_CSV> --label_column <LABEL_HEADER_1> <LABEL_HEADER_2> ... <LABEL_HEADER_N> --out_filename <NEW_ANNOTATION_CSV>

where you can provide multiple labels for multitasking, thought the model provides predictions only for the first specified label for now. Finally, if annotations are ordinal, you can get learned coefficients from Ridge Regression for each word in the vocabulary of all posts (in descending order of importance) using a tf-idf model to represent each document using:

python subreddit_analysis/bow_model.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANY_ANNOTATION_CSV> --label_column <LABEL_HEADER> --out_filename <IMPORTANCE_CSV>
  1. Run analyses using r_scripts.
Owner
Georgios Chochlakis
ML researcher; CS PhD student @ Uni of Southern California
Georgios Chochlakis
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambig

王皓波 147 Jan 07, 2023
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하

Junhyuk Park 9 Dec 01, 2022
PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability PCACE is a new algorithm for ranking neurons in a CNN architecture in order

4 Jan 04, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

18 May 27, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

NeRF-pytorch NeRF (Neural Radiance Fields) is a method that achieves state-of-the-art results for synthesizing novel views of complex scenes. Here are

Yen-Chen Lin 3.2k Jan 08, 2023
Robocop is your personal mini voice assistant made using Python.

Robocop-VoiceAssistant To use this project, you should have python installed in your system. If you don't have python installed, install it beforehand

Sohil Khanduja 3 Feb 26, 2022
Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Multi-Objective Loss Balancing for Physics-Informed Deep Learning Code for ReLoBRaLo. Abstract Physics Informed Neural Networks (PINN) are algorithms

Rafael Bischof 16 Dec 12, 2022
Transfer SemanticKITTI labeles into other dataset/sensor formats.

LiDAR-Transfer Transfer SemanticKITTI labeles into other dataset/sensor formats. Content Convert datasets (NUSCENES, FORD, NCLT) to KITTI format Minim

Photogrammetry & Robotics Bonn 64 Nov 21, 2022
Custom implementation of Corrleation Module

Pytorch Correlation module this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC This tutorial was used as a basis for

Clément Pinard 361 Dec 12, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization This repository contains the source code for the paper (link wi

Rakuten Group, Inc. 0 Nov 19, 2021
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
Learning to Stylize Novel Views

Learning to Stylize Novel Views [Project] [Paper] Contact: Hsin-Ping Huang ([ema

34 Nov 27, 2022
Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

The Temporal Robustness of Stochastic Signals Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals" Case stud

0 Oct 28, 2021
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

The-Emergence-of-Objectness This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

44 Oct 08, 2022
PyTorch Implementation of SSTNs for hyperspectral image classifications from the IEEE T-GRS paper "Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework."

PyTorch Implementation of SSTN for Hyperspectral Image Classification Paper links: SSTN published on IEEE T-GRS. Also, you can directly find the imple

Zilong Zhong 54 Dec 19, 2022
Sinkformers: Transformers with Doubly Stochastic Attention

Code for the paper : "Sinkformers: Transformers with Doubly Stochastic Attention" Paper You will find our paper here. Compat This package has been dev

Michael E. Sander 31 Dec 29, 2022