Distance Encoding for GNN Design

Overview

Distance-encoding for GNN design

This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper:
Distance-Encoding -- Design Provably More PowerfulGNNs for Structural Representation Learning, to appear in NeurIPS 2020.

The project's home page is: http://snap.stanford.edu/distance-encoding/

Authors & Contact

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Questions on this repo can be emailed to [email protected] (Yanbang Wang)

Installation

Requirements: Python >= 3.5, Anaconda3

  • Update conda:
conda update -n base -c defaults conda
  • Install basic dependencies to virtual environment and activate it:
conda env create -f environment.yml
conda activate degnn-env
  • Install PyTorch >= 1.4.0 and torch-geometric >= 1.5.0 (please refer to the PyTorch and PyTorch Geometric official websites for more details). Commands examples are:
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-geometric

The latest tested combination is: Python 3.8.2 + Pytorch 1.4.0 + torch-geometric 1.5.0.

Quick Start

python main.py --dataset celegans --feature sp --hidden_features 100 --prop_depth 1 --test_ratio 0.1 --epoch 300

    This uses 100-dimensional hidden features, 80/10/10 split of train/val/test set, and trains for 300 epochs.

  • To train DEAGNN-SPD for Task 3 (node-triads prediction) on C.elegans dataset:
python main.py --dataset celegans_tri --hidden_features 100 --prop_depth 2 --epoch 300 --feature sp --max_sp 5 --l2 1e-3 --test_ratio 0.1 --seed 9

    This enables 2-hop propagation per layer, truncates distance encoding at 5, and uses random seed 9.

  • To train DEGNN-LP (i.e. the random walk variant) for Task 1 (node-level prediction) on usa-airports using average accuracy as evaluation metric:
python main.py --dataset usa-airports --metric acc --hidden_features 100 --feature rw --rw_depth 2 --epoch 500 --bs 128 --test_ratio 0.1

Note that here the test_ratio currently contains both validation set and the actual test set, and will be changed to contain only test set.

  • To generate Figure2 LEFT of the paper (Simulation to validate Theorem 3.3):
python main.py --dataset simulation --max_sp 10

    The result will be plot to ./simulation_results.png.

  • All detailed training logs can be found at <log_dir>/<dataset>/<training-time>.log. A one-line summary will also be appended to <log_dir>/result_summary.log for each training instance.

Usage Summary

Interface for DE-GNN framework [-h] [--dataset DATASET] [--test_ratio TEST_RATIO]
                                      [--model {DE-GNN,GIN,GCN,GraphSAGE,GAT}] [--layers LAYERS]
                                      [--hidden_features HIDDEN_FEATURES] [--metric {acc,auc}] [--seed SEED] [--gpu GPU]
                                      [--data_usage DATA_USAGE] [--directed DIRECTED] [--parallel] [--prop_depth PROP_DEPTH]
                                      [--use_degree USE_DEGREE] [--use_attributes USE_ATTRIBUTES] [--feature FEATURE]
                                      [--rw_depth RW_DEPTH] [--max_sp MAX_SP] [--epoch EPOCH] [--bs BS] [--lr LR]
                                      [--optimizer OPTIMIZER] [--l2 L2] [--dropout DROPOUT] [--k K] [--n [N [N ...]]]
                                      [--N N] [--T T] [--log_dir LOG_DIR] [--summary_file SUMMARY_FILE] [--debug]

Optinal Arguments

  -h, --help            show this help message and exit
  
  # general settings
  --dataset DATASET     dataset name
  --test_ratio TEST_RATIO
                        ratio of the test against whole
  --model {DE-GCN,GIN,GAT,GCN,GraphSAGE}
                        model to use
  --layers LAYERS       largest number of layers
  --hidden_features HIDDEN_FEATURES
                        hidden dimension
  --metric {acc,auc}    metric for evaluating performance
  --seed SEED           seed to initialize all the random modules
  --gpu GPU             gpu id
  --adj_norm {asym,sym,None}
                        how to normalize adj
  --data_usage DATA_USAGE
                        use partial dataset
  --directed DIRECTED   (Currently unavailable) whether to treat the graph as directed
  --parallel            (Currently unavailable) whether to use multi cpu cores to prepare data
  
  # positional encoding settings
  --prop_depth PROP_DEPTH
                        propagation depth (number of hops) for one layer
  --use_degree USE_DEGREE
                        whether to use node degree as the initial feature
  --use_attributes USE_ATTRIBUTES
                        whether to use node attributes as the initial feature
  --feature FEATURE     distance encoding category: shortest path or random walk (landing probabilities)
  --rw_depth RW_DEPTH   random walk steps
  --max_sp MAX_SP       maximum distance to be encoded for shortest path feature
  
  # training settings
  --epoch EPOCH         number of epochs to train
  --bs BS               minibatch size
  --lr LR               learning rate
  --optimizer OPTIMIZER
                        optimizer to use
  --l2 L2               l2 regularization weight
  --dropout DROPOUT     dropout rate
  
  # imulation settings (valid only when dataset == 'simulation')
  --k K                 node degree (k) or synthetic k-regular graph
  --n [N [N ...]]       a list of number of nodes in each connected k-regular subgraph
  --N N                 total number of nodes in simultation
  --T T                 largest number of layers to be tested
  
  # logging
  --log_dir LOG_DIR     log directory
  --summary_file SUMMARY_FILE
                        brief summary of training result
  --debug               whether to use debug mode

Reference

If you make use of the code/experiment of Distance-encoding in your work, please cite our paper:

@article{li2020distance,
  title={Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning},
  author={Li, Pan and Wang, Yanbang and Wang, Hongwei and Leskovec, Jure},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
Auditing Black-Box Prediction Models for Data Minimization Compliance

Data-Minimization-Auditor An auditing tool for model-instability based data minimization that is introduced in "Auditing Black-Box Prediction Models f

Bashir Rastegarpanah 2 Mar 24, 2022
PyTorch implementation of some learning rate schedulers for deep learning researcher.

pytorch-lr-scheduler PyTorch implementation of some learning rate schedulers for deep learning researcher. Usage WarmupReduceLROnPlateauScheduler Visu

Soohwan Kim 59 Dec 08, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
Code for MSc Quantitative Finance Dissertation

MSc Dissertation Code ReadMe Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks Curtis Nybo MSc Quantitative F

2 Dec 01, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

Secondmind Labs 107 Nov 02, 2022
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023
Transfer Learning Shootout for PyTorch's model zoo (torchvision)

pytorch-retraining Transfer Learning shootout for PyTorch's model zoo (torchvision). Load any pretrained model with custom final layer (num_classes) f

Alexander Hirner 169 Jun 29, 2022
Bayesian Deep Learning and Deep Reinforcement Learning for Object Shape Error Response and Correction of Manufacturing Systems

Bayesian Deep Learning for Manufacturing 2.0 (dlmfg) Object Shape Error Response (OSER) Digital Lifecycle Management - In Process Quality Improvement

Sumit Sinha 30 Oct 31, 2022
Machine learning Bot detection technique, based on United States election dataset

Machine learning Bot detection technique, based on United States election dataset (2020). Current github repo provides implementation described in pap

Alexander Shevtsov 4 Nov 20, 2022
Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

1.4k Jan 05, 2023
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

CurriculumNet Introduction This repo contains related code and models from the ECCV 2018 CurriculumNet paper. CurriculumNet is a new training strategy

156 Jul 04, 2022
🧮 Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model A

Florian Wilhelm 39 Dec 03, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 05, 2022
Laser device for neutralizing - mosquitoes, weeds and pests

Laser device for neutralizing - mosquitoes, weeds and pests (in progress) Here I will post information for creating a laser device. A warning!! How It

Ildaron 1k Jan 02, 2023
This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

The PyTorch implementation of IB-GAN model of AAAI 2021 This package contains a PyTorch implementation of IB-GAN presented in the submitted paper (IB-

Insu Jeon 9 Mar 30, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

51 Dec 11, 2022
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Conceptual 12M We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-train

Google Research Datasets 226 Dec 07, 2022