[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Overview

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Derek Lim*, Felix Hohne*, Xiuyu Li*, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim

Published at NeurIPS 2021

Here are codes to load our proposed datasets, compute our measure of homophily, and train various graph machine learning models in our experimental setup. We include an implementation of the new graph neural network LINKX that we develop.

Organization

main.py contains the main full batch experimental scripts.

main_scalable.py contains the minibatching experimental scripts.

parse.py contains flags for running models with specific settings and hyperparameters.

dataset.py loads our datasets.

models.py contains implementations for graph machine learning models, though C&S (correct_smooth.py, cs_tune_hparams.py) are in separate files. Running several of the GNN models on larger datasets may require at least 24GB of VRAM. Our LINKX model is implemented in this file.

homophily.py contains functions for computing homophily measures, including the one that we introduce in our_measure.

experiments/ contains the bash files to reproduce full batch experiments.

scalable_experiments/ contains the bash files to reproduce minibatching experiments.

wiki_scraping/ contains the Python scripts to reproduce the "wiki" dataset by querying the Wikipedia API and cleaning up the data.

Datasets

Screenshot 2021-06-03 at 6 04 01 PM

As discussed in the paper, our proposed datasets are "genius", "twitch-gamer", "fb100", "pokec", "wiki", "arxiv-year", and "snap-patents", which can be loaded by load_nc_dataset in dataset.py by passing in their respective string name. Many of these datasets are included in the data/ directory, but wiki, twitch-gamer, snap-patents, and pokec are automatically downloaded from a Google drive link when loaded from dataset.py. The arxiv-year dataset is downloaded using OGB downloaders. load_nc_dataset returns an NCDataset, the documentation for which is also provided in dataset.py. It is functionally equivalent to OGB's Library-Agnostic Loader for Node Property Prediction, except for the fact that it returns torch tensors. See the OGB website for more specific documentation. Just like the OGB function, dataset.get_idx_split() returns fixed dataset split for training, validation, and testing.

When there are multiple graphs (as in the case of fb100), different ones can be loaded by passing in the sub_dataname argument to load_nc_dataset in dataset.py. In particular, fb100 consists of 100 graphs. We only include ["Amherst41", "Cornell5", "Johns Hopkins55", "Penn94", "Reed98"] in this repo, although others may be downloaded from the internet archive. In the paper we test on Penn94.

References

The datasets come from a variety of sources, as listed here:

  • Penn94. Traud et al 2012. Social Structure of Facebook Networks
  • pokec. Leskovec et al. Stanford Network Analysis Project
  • arXiv-year. Hu et al 2020. Open Graph Benchmark
  • snap-patents. Leskovec et al. Stanford Network Analysis Project
  • genius. Lim and Benson 2020. Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform
  • twitch-gamers. Rozemberczki and Sarkar 2021. Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings
  • wiki. Collected by the authors of this work in 2021.

Installation instructions

  1. Create and activate a new conda environment using python=3.8 (i.e. conda create --name non-hom python=3.8)
  2. Activate your conda environment
  3. Check CUDA version using nvidia-smi
  4. run bash install.sh cu110, replacing cu110 with your CUDA version (CUDA 11 -> cu110, CUDA 10.2 -> cu102, CUDA 10.1 -> cu101). We tested on Ubuntu 18.04, CUDA 11.0.

Running experiments

  1. Make sure a results folder exists in the root directory.
  2. Our experiments are in the experiments/ and scalable_experiments/ directory. There are bash scripts for running methods on single and multiple datasets. Please note that the experiments must be run from the root directory, e.g. (bash experiments/mixhop_exp.sh snap-patents). For instance, to run the MixHop experiments on arxiv-year, use:
bash experiments/mixhop_exp.sh arxiv-year

To run LINKX on pokec, use:

bash experiments/linkx_exp.sh pokec

To run LINK on Penn94, use:

bash experiments/link_exp.sh fb100 Penn94

To run GCN-cluster on twitch-gamers, use:

bash scalable_experiments/gcn_cluster.sh twitch-gamer

To run LINKX minibatched on wiki, use

bash scalable_experiments/linkx_exp.sh wiki

To run LINKX on Geom-GCN with full hyperparameter grid on chameleon, use

bash experiments/linkx_tuning.sh chameleon
Owner
Cornell University Artificial Intelligence
Deep Convolutional Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala All images in t

Alec Radford 3.4k Dec 29, 2022
Solutions and questions for AoC2021. Merry christmas!

Advent of Code 2021 Merry christmas! 🎄 🎅 To get solutions and approximate execution times for implementations, please execute the run.py script in t

Wilhelm Ågren 5 Dec 29, 2022
PyTorch implementation of Super SloMo by Jiang et al.

Super-SloMo PyTorch implementation of "Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation" by Jiang H., Sun

Avinash Paliwal 2.9k Jan 03, 2023
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022
A ssl analyzer which could analyzer target domain's certificate.

ssl_analyzer A ssl analyzer which could analyzer target domain's certificate. Analyze the domain name ssl certificate information according to the inp

vincent 17 Dec 12, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
Official implement of "CAT: Cross Attention in Vision Transformer".

CAT: Cross Attention in Vision Transformer This is official implement of "CAT: Cross Attention in Vision Transformer". Abstract Since Transformer has

100 Dec 15, 2022
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking."

Expert-Linking Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking." This is

BoChen 12 Jan 01, 2023
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color Overview Code and dataset for The World of an Octopus: H

1 Nov 13, 2021
Revisting Open World Object Detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our new data division is based on COCO2017. We divide the training set into

58 Dec 23, 2022
This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If

tsc 0 Jan 11, 2022
LineBoard - Python+React+MySQL-白板即時系統改善人群行為

LineBoard-白板即時系統改善人群行為 即時顯示實驗室的使用狀況,並遠端預約排隊,以此來改善人們的工作效率 程式架構 運作流程 使用者先至該實驗室網站預約

Bo-Jyun Huang 1 Feb 22, 2022
Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

14 Nov 06, 2022
ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Monocular, One-stage, Regression of Multiple 3D People ROMP, accepted by ICCV 2021, is a concise one-stage network for multi-person 3D mesh recovery f

Yu Sun 937 Jan 04, 2023
PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

Squidward 62 Nov 28, 2022
Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

LEXA Benchmark Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper (Discovering and Achieving Goals via World Models

Oleg Rybkin 36 Dec 22, 2022
The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees This project contains the codes of pap

0 Apr 20, 2022
Frigate - NVR With Realtime Object Detection for IP Cameras

A complete and local NVR designed for HomeAssistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras.

Blake Blackshear 6.4k Dec 31, 2022