[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Overview

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Derek Lim*, Felix Hohne*, Xiuyu Li*, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim

Published at NeurIPS 2021

Here are codes to load our proposed datasets, compute our measure of homophily, and train various graph machine learning models in our experimental setup. We include an implementation of the new graph neural network LINKX that we develop.

Organization

main.py contains the main full batch experimental scripts.

main_scalable.py contains the minibatching experimental scripts.

parse.py contains flags for running models with specific settings and hyperparameters.

dataset.py loads our datasets.

models.py contains implementations for graph machine learning models, though C&S (correct_smooth.py, cs_tune_hparams.py) are in separate files. Running several of the GNN models on larger datasets may require at least 24GB of VRAM. Our LINKX model is implemented in this file.

homophily.py contains functions for computing homophily measures, including the one that we introduce in our_measure.

experiments/ contains the bash files to reproduce full batch experiments.

scalable_experiments/ contains the bash files to reproduce minibatching experiments.

wiki_scraping/ contains the Python scripts to reproduce the "wiki" dataset by querying the Wikipedia API and cleaning up the data.

Datasets

Screenshot 2021-06-03 at 6 04 01 PM

As discussed in the paper, our proposed datasets are "genius", "twitch-gamer", "fb100", "pokec", "wiki", "arxiv-year", and "snap-patents", which can be loaded by load_nc_dataset in dataset.py by passing in their respective string name. Many of these datasets are included in the data/ directory, but wiki, twitch-gamer, snap-patents, and pokec are automatically downloaded from a Google drive link when loaded from dataset.py. The arxiv-year dataset is downloaded using OGB downloaders. load_nc_dataset returns an NCDataset, the documentation for which is also provided in dataset.py. It is functionally equivalent to OGB's Library-Agnostic Loader for Node Property Prediction, except for the fact that it returns torch tensors. See the OGB website for more specific documentation. Just like the OGB function, dataset.get_idx_split() returns fixed dataset split for training, validation, and testing.

When there are multiple graphs (as in the case of fb100), different ones can be loaded by passing in the sub_dataname argument to load_nc_dataset in dataset.py. In particular, fb100 consists of 100 graphs. We only include ["Amherst41", "Cornell5", "Johns Hopkins55", "Penn94", "Reed98"] in this repo, although others may be downloaded from the internet archive. In the paper we test on Penn94.

References

The datasets come from a variety of sources, as listed here:

  • Penn94. Traud et al 2012. Social Structure of Facebook Networks
  • pokec. Leskovec et al. Stanford Network Analysis Project
  • arXiv-year. Hu et al 2020. Open Graph Benchmark
  • snap-patents. Leskovec et al. Stanford Network Analysis Project
  • genius. Lim and Benson 2020. Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform
  • twitch-gamers. Rozemberczki and Sarkar 2021. Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings
  • wiki. Collected by the authors of this work in 2021.

Installation instructions

  1. Create and activate a new conda environment using python=3.8 (i.e. conda create --name non-hom python=3.8)
  2. Activate your conda environment
  3. Check CUDA version using nvidia-smi
  4. run bash install.sh cu110, replacing cu110 with your CUDA version (CUDA 11 -> cu110, CUDA 10.2 -> cu102, CUDA 10.1 -> cu101). We tested on Ubuntu 18.04, CUDA 11.0.

Running experiments

  1. Make sure a results folder exists in the root directory.
  2. Our experiments are in the experiments/ and scalable_experiments/ directory. There are bash scripts for running methods on single and multiple datasets. Please note that the experiments must be run from the root directory, e.g. (bash experiments/mixhop_exp.sh snap-patents). For instance, to run the MixHop experiments on arxiv-year, use:
bash experiments/mixhop_exp.sh arxiv-year

To run LINKX on pokec, use:

bash experiments/linkx_exp.sh pokec

To run LINK on Penn94, use:

bash experiments/link_exp.sh fb100 Penn94

To run GCN-cluster on twitch-gamers, use:

bash scalable_experiments/gcn_cluster.sh twitch-gamer

To run LINKX minibatched on wiki, use

bash scalable_experiments/linkx_exp.sh wiki

To run LINKX on Geom-GCN with full hyperparameter grid on chameleon, use

bash experiments/linkx_tuning.sh chameleon
Owner
Cornell University Artificial Intelligence
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023
[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

Junha Lee 10 Dec 02, 2022
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022
Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

FastBERT Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time". Good News 2021/10/29 - Code: Code of FastPLM is released on

Weijie Liu 584 Jan 02, 2023
Source code related to the article submitted to the International Conference on Computational Science ICCS 2022 in London

POTHER: Patch-Voted Deep Learning-based Chest X-ray Bias Analysis for COVID-19 Detection Source code related to the article submitted to the Internati

Tomasz Szczepański 1 Apr 29, 2022
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Are Transformers More Robust Than CNNs? Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs? Our implementation is b

Yutong Bai 145 Dec 01, 2022
PyTorch implementation of Higher Order Recurrent Space-Time Transformer

Higher Order Recurrent Space-Time Transformer (HORST) This is the official PyTorch implementation of Higher Order Recurrent Space-Time Transformer. Th

13 Oct 18, 2022
Pointer networks Tensorflow2

Pointer networks Tensorflow2 原文:https://arxiv.org/abs/1506.03134 仅供参考与学习,内含代码备注 环境 tensorflow==2.6.0 tqdm matplotlib numpy 《pointer networks》阅读笔记 应用场景

HUANG HAO 7 Oct 27, 2022
Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

This repository contains code for the paper Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiati

8 Aug 28, 2022
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Brent M. Spell 134 Dec 30, 2022
Codebase for the paper titled "Continual learning with local module selection"

This repository contains the codebase for the paper Continual Learning via Local Module Composition. Setting up the environemnt Create a new conda env

Oleksiy Ostapenko 20 Dec 10, 2022
Diffusion Normalizing Flow (DiffFlow) Neurips2021

Diffusion Normalizing Flow (DiffFlow) Reproduce setup environment The repo heavily depends on jam, a personal toolbox developed by Qsh.zh. The API may

76 Jan 01, 2023
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

🌟 Sparse Spatial Transformers for Few-Shot Learning This code implements the Sparse Spatial Transformers for Few-Shot Learning(SSFormers). Our code i

chx_nju 38 Dec 13, 2022
Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

Hehe Fan 63 Dec 09, 2022
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Target Adaptive Context Aggregation for Video Scene Graph Generation This is a PyTorch implementation for Target Adaptive Context Aggregation for Vide

Multimedia Computing Group, Nanjing University 44 Dec 14, 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

P-tuning v2 P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks An optimized prompt tuning strategy for sma

THUDM 540 Dec 30, 2022
Python版OpenCVのTracking APIのサンプルです。DaSiamRPNアルゴリズムまで対応しています。

OpenCV-Object-Tracker-Sample Python版OpenCVのTracking APIのサンプルです。   Requirement opencv-contrib-python 4.5.3.56 or later Algorithm 2021/07/16時点でOpenCVには以

KazuhitoTakahashi 36 Jan 01, 2023