A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Overview

Splitter Arxiv repo sizebenedekrozemberczki

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License


Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 03, 2023
An example of time series augmentation methods with Keras

Time Series Augmentation This is a collection of time series data augmentation methods and an example use using Keras. News 2020/04/16: Repository Cre

九州大学 ヒューマンインタフェース研究室 229 Jan 02, 2023
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

CycleGAN PyTorch | project page | paper Torch implementation for learning an image-to-image translation (i.e. pix2pix) without input-output pairs, for

Jun-Yan Zhu 11.5k Dec 30, 2022
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Vaibhav Saxena 35 Nov 06, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

google-drive-to-sqlite Create a SQLite database containing metadata from Google

Simon Willison 140 Dec 04, 2022
Automatic Video Captioning Evaluation Metric --- EMScore

Automatic Video Captioning Evaluation Metric --- EMScore Overview For an illustration, EMScore can be computed as: Installation modify the encode_text

Yaya Shi 17 Nov 28, 2022
v objective diffusion inference code for JAX.

v-diffusion-jax v objective diffusion inference code for JAX, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The models

Katherine Crowson 186 Dec 21, 2022
Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

gans-collection.torch Torch implementation of various types of GANs (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN). Note that EBGAN and

Minchul Shin 53 Jan 22, 2022
The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

Wentao Xu 24 Dec 05, 2022
FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Face Anonymizer Blur faces from image and video files in /input/ folder. Require

22 Nov 03, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
Basit bir burç modülü.

Bu modulu burclar hakkinda gundelik bir sekilde bilgi alin diye yaptim ve sizler icin kullanima sunuyorum. Modulun kullanimi asiri basit: Ornek Kullan

Special 17 Jun 08, 2022
A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

22 Nov 23, 2022
PyTorch implementation of PSPNet segmentation network

pspnet-pytorch PyTorch implementation of PSPNet segmentation network Original paper Pyramid Scene Parsing Network Details This is a slightly different

Roman Trusov 532 Dec 29, 2022
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Combined Radiology and Pathology Classification MICCAI 2020 Combined Radiology a

22 Dec 08, 2022
Does Pretraining for Summarization Reuqire Knowledge Transfer?

Pretraining summarization models using a corpus of nonsense

Approximately Correct Machine Intelligence (ACMI) Lab 12 Dec 19, 2022
Fuzzing the Kernel Using Unicornafl and AFL++

Unicorefuzz Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19. Is it any good? ye

Security in Telecommunications 283 Dec 26, 2022