A Broader Picture of Random-walk Based Graph Embedding

Overview

Random-walk Embedding Framework

This repository is a reference implementation of the random-walk embedding framework as described in the paper:

A Broader Picture of Random-walk Based Graph Embedding.
Zexi Huang, Arlei Silva, Ambuj Singh.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2021.

The framework decomposes random-walk based graph embedding into three major components: random-walk process, similarity function, and embedding algorithm. By tuning the components, it not only covers many existing approaches such as DeepWalk but naturally motivates novel ones that have shown superior performance on certain downstream tasks.

Usage

Example

To use the framework with default settings to embed the BlogCatalog network:
python src/embedding.py --graph graph/blogcatalog.edges --embeddings emb/blogcatalog.embeddings
where graph/blogcatalog.edges stores the input graph and emb/blogcatalog.embeddings is the target file for output embeddings.

Options

You can check out all the available options (framework components, Markov time parameters, graph types, etc.) with:
python src/embedding.py --help

Input Graph

The supported input graph format is a list of edges:

node1_id_int node2_id_int <weight_float, optional>

where node ids are should be consecutive integers starting from 1. The graph is by default undirected and unweighted, which can be changed by setting appropriate flags.

Output Embeddings

The output embedding file has n lines where n is the number of nodes in the graph. Each line stores the learned embedding of the node with its id equal to the line number:

emb_dim1 emb_dim2 ... emb_dimd

Evaluating

Here, we show by examples how to evaluate and compare different settings of our framework on node classification, link prediction, and community detection tasks. Full evaluation options are can be found with:
python src/evaluating.py --help

Note that the results shown below may not be identical to those in the paper due to different random seeds, but the conclusions are the same.

Node Classification

Once we generate the embedding with the script in previous section, we can call
python src/evaluating.py --task node-classification --embeddings emb/blogcatalog.embeddings --training-ratio 0.5
to compute the Micro-F1 and Macro-F1 scores of the node classification.

The results for comparing Pointwise Mutual Information (PMI) and Autocovariance (AC) similarity metrics with the best Markov times and varying training ratios are as follows:

Training Ratio 10% 20% 30% 40% 50% 60% 70% 80% 90%
PMI Micro-F1 0.3503 0.3814 0.3993 0.4106 0.4179 0.4227 0.4255 0.4222 0.4228
(time=4) Macro-F1 0.2212 0.2451 0.2575 0.2669 0.2713 0.2772 0.2768 0.2689 0.2678
AC Micro-F1 0.3547 0.3697 0.3785 0.3837 0.3872 0.3906 0.3912 0.3927 0.3930
(time=5) Macro-F1 0.2137 0.2299 0.2371 0.2406 0.2405 0.2413 0.2385 0.2356 0.2352

Link Prediction

Prepare

To evaluate the embedding method on link prediction, we first have to remove a ratio of edges in the original graph:
python src/evaluating.py --task link-prediction --mode prepare --graph graph/blogcatalog.edges --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges

This takes the original graph graph/blogcatalog.edges as input and output the removed and remaining edges to graph/blogcatalog.removed-edges and graph/blogcatalog.remaining-edges.

Embed

Then, we embed based on the remaining edges of the network with the embedding script. For example:
python src/embedding.py --graph graph/blogcatalog.remaining-edges --embeddings emb/blogcatalog.residual-embeddings

Evaluate

Finally, we evaluate the performance of link prediction in terms of [email protected] based on the embeddings of the residual graph and the removed edges:
python src/evaluating.py --task link-prediction --mode evaluate --embeddings emb/blogcatalog.residual-embeddings --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges --k 1.0

The results for comparing PMI and autocovariance similarity metrics with the best Markov times and varying k are as follows:

k 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PMI (time=1) 0.2958 0.2380 0.2068 0.1847 0.1678 0.1560 0.1464 0.1382 0.1315 0.1260
AC (time=3) 0.4213 0.3420 0.2982 0.2667 0.2434 0.2253 0.2112 0.2000 0.1893 0.1802

Community Detection

Assume the embeddings for the Airport network emb/airport.embeddings have been generated. The following computes the Normalized Mutual Information (NMI) between the ground-truth country communities and the k-means clustering of embeddings:
python src/evaluating.py --task community-detection --embeddings emb/airport.embeddings --communities graph/airport.country-labels

Citing

If you find our framework useful, please consider citing the following paper:

@inproceedings{random-walk-embedding,
author = {Huang, Zexi and Silva, Arlei and Singh, Ambuj},
 title = {A Broader Picture of Random-walk Based Graph Embedding},
 booktitle = {SIGKDD},
 year = {2021}
}
Owner
Zexi Huang
Zexi Huang
SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

SparseInst 🚀 A simple framework for real-time instance segmentation, CVPR 2022 by Tianheng Cheng, Xinggang Wang†, Shaoyu Chen, Wenqiang Zhang, Qian Z

Hust Visual Learning Team 458 Jan 05, 2023
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Event Queue Dialect Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure. Motivation The m

Cornell Capra 23 Dec 08, 2022
A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

HReynaud 5 Jan 03, 2023
clustimage is a python package for unsupervised clustering of images.

clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve

Erdogan Taskesen 52 Jan 02, 2023
A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

yolov5-helmet-detection-python A Python implementation of Yolov5 to detect head or helmet in the wild in Jetson Xavier nx and Jetson nano. In Jetson X

12 Dec 05, 2022
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Introduction This repository contains my unofficial reimplementation of the standard ECAPA-TDNN, which is the speaker recognition in VoxCeleb2 dataset

Tao Ruijie 277 Dec 31, 2022
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022
AI Flow is an open source framework that bridges big data and artificial intelligence.

Flink AI Flow Introduction Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine

144 Dec 30, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

ffcv ImageNet Training A minimal, single-file PyTorch ImageNet training script designed for hackability. Run train_imagenet.py to get... ...high accur

FFCV 92 Dec 31, 2022
Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
A package, and script, to perform imaging transcriptomics on a neuroimaging scan.

Imaging Transcriptomics Imaging transcriptomics is a methodology that allows to identify patterns of correlation between gene expression and some prop

Alessio Giacomel 10 Dec 27, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 01, 2023
TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

TensorFlow-LiveLessons Note that the second edition of this video series is now available here. The second edition contains all of the content from th

Deep Learning Study Group 830 Jan 03, 2023
Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste

Phạm Quốc Huy 1 Dec 18, 2021
Doge-Prediction - Coding Club prediction ig

Doge-Prediction Coding Club prediction ig Basically: Create an application that

1 Jan 10, 2022
Mscp jamf - Build compliance in jamf

mscp_jamf Build compliance in Jamf. This will build the following xml pieces to

Bob Gendler 3 Jul 25, 2022
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022