MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Last update: Jun 28, 2022

Related tags

Overview

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Results on MAG240M

Here, we demonstrate the following performance on the MAG240M dataset from [email protected] 2021.

Model	Test Acc	Validation Acc	Parameters	Hardware
Our Model	0.7447	0.7669 ± 0.0003 (ensemble 0.7696)	743,449	Tesla V100 (21GB)

Reproducing results

0. Requirements

Here just list python3 packages we used in this competition:

numpy==1.19.2
torch==1.5.1+cu101
dgl-cu101==0.6.0.post1
ogb==1.3.1
sklearn==0.23.2
tqdm==4.46.1

1. Prepare Graph and Features

The preprocess code modifed from dgl baseline. We created graph with 6 different edge types instead of 5.

# Time cost: 3hours,30mins

python3 $MAG_CODE_PATH/preprocess.py
        --rootdir $MAG_INPUT_PATH \
        --author-output-path $MAG_PREP_PATH/author.npy \
        --inst-output-path $MAG_PREP_PATH/inst.npy \
        --graph-output-path $MAG_PREP_PATH \
        --graph-as-homogeneous \
        --full-output-path $MAG_PREP_PATH/full_feat.npy

The graphs and features will be saved in MAG_PREP_PATH , where the MAG_PREP_PATH is specified in run.sh.

Calculate features

The meta-path based features are generated by this script. Details can be found in our technical report.

# Time cost: 2hours,20mins (only generate label related features)

python3 $MAG_CODE_PATH/feature.py
        $MAG_INPUT_PATH \
        $MAG_PREP_PATH/dgl_graph_full_heterogeneous_csr.bin \
        $MAG_FEAT_PATH \
        --seed=42

Train RGAT model and prepare RGAT features

The RGAT model is modifed from dgl baseline. The validation accuracy is 0.701 , as same as described in the dgl baseline github.

# Time cost: 33hours,40mins (20mins for each epoch)

python3 $MAG_CODE_PATH/rgat.py
        --rootdir $MAG_INPUT_PATH \
        --graph-path $MAG_PREP_PATH/dgl_graph_full_homogeneous_csc.bin \
        --full-feature-path $MAG_PREP_PATH/full_feat.npy \
        --output-path $MAG_RGAT_PATH/ \
        --epochs=100 \
        --model-path $MAG_RGAT_PATH/model.pt \
        --submission-path $MAG_RGAT_PATH/

You will get embeddings as input features of the following MPLP models.

2. Train MPLP models

The train process splits to two steps:

train the model with full train samples at a large learning rate (here we use StepLR(lr=0.01, step_size=100, gamma=0.25))
then fine tune the model with latest train samples (eg, paper with year >= 2018) with a small learning rate (0.000625)

You can train the MPLP model by running the following commands:

# Time cost: 2hours,40mins for each seed

for seed in $(seq 0 7);
do
    python3 $MAG_CODE_PATH/mplp.py \
            $MAG_INPUT_PATH \
            $MAG_MPLP_PATH/data/ \
            $MAG_MPLP_PATH/output/seed${seed} \
            --gpu \
            --seed=${seed} \
            --batch_size=10240 \
            --epochs=200 \
            --num_layers=2 \
            --learning_rate=0.01 \
            --dropout=0.5 \
            --num_splits=5
done

3. Ensemble MPLP results

While having all the results with k-fold cross validation training under 8 different seeds, you can average the results by running code below:

python3 $MAG_CODE_PATH/ensemble.py $MAG_MPLP_PATH/output/ $MAG_SUBM_PATH

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Related tags

Overview

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Results on MAG240M

Reproducing results

0. Requirements

1. Prepare Graph and Features

Calculate features

Train RGAT model and prepare RGAT features

2. Train MPLP models

3. Ensemble MPLP results

Owner

Qiuying Peng

Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

ReAct: Out-of-distribution Detection With Rectified Activations

Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Deep Q-Learning Network in pytorch (not actively maintained)

Intrusion Detection System using ensemble learning (machine learning)

CountDown to New Year and shoot fireworks

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

This is a repo of basic Machine Learning!

Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Using the provided dataset which includes various book features, in order to predict the price of books, using various proposed methods and models.

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

AdamW optimizer for bfloat16 models in pytorch.

patchmatch和patchmatchstereo算法的python实现

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

The Python3 import playground

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

Deep motion transfer

Torch-based tool for quantizing high-dimensional vectors using additive codebooks