Graph neural network message passing reframed as a Transformer with local attention

Last update: Dec 28, 2022

Overview

Adjacent Attention Network

An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with multi-head attention at each successive layer. Since Graph Attention Network is already taken, I decided to name it Adjacent Attention Network instead. The design will be more transformer-centric. Instead of using the square root inverse adjacency matrix trick by Kipf and Welling, in this framework it will simply be translated to the proper attention mask at each layer.

This repository is for my own exploration into the graph neural network field. My gut tells me the transformers architecture can generalize and outperform graph neural networks.

Install

$ pip install adjacent-attention-network

Usage

Basically a transformers where each node pays attention to the neighbors as defined by the adjacency matrix. Complexity is O(n * max_neighbors). Max number of neighbors as defined by the adjacency matrix.

The following example will have a complexity of ~ 1024 * 100

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4
)

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1) < 0.1
nodes   = torch.randn(1, 1024, 512)
mask    = torch.ones(1, 1024).bool()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

If the number of neighbors contain outliers, then the above will lead to wasteful computation, since a lot of nodes will be doing attention on padding. You can use the following stop-gap measure to account for these outliers.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_neighbors_cutoff = 100
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

# for some reason, one of the nodes is fully connected to all others
adj_mat[:, 0] = 1.

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

For non-local attention, I've decided to use a trick from the Set Transformers paper, the Induced Set Attention Block (ISAB). From the lens of graph neural net literature, this would be analogous as having global nodes for message passing non-locally.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_global_nodes = 5
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

519 Jan 2, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

71 Dec 30, 2022

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

25 Dec 28, 2022

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

21 Dec 30, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

7 Nov 7, 2022

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

8 Dec 30, 2022

Graph neural network message passing reframed as a Transformer with local attention

Related tags

Overview

Adjacent Attention Network

Install

Usage

You might also like...

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Releases(0.0.12)

0.0.12(Dec 24, 2022)

0.0.11(Dec 14, 2020)

0.0.10(Dec 14, 2020)

0.0.9(Dec 14, 2020)

0.0.8(Dec 14, 2020)

0.0.7(Dec 14, 2020)

0.0.6(Dec 14, 2020)

0.0.5(Dec 14, 2020)

0.0.4(Dec 14, 2020)

0.0.3(Dec 14, 2020)

0.0.2(Dec 14, 2020)

0.0.1(Dec 14, 2020)

Owner

Phil Wang

This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

MG-GCN: Scalable Multi-GPU GCN Training Framework

A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

This repository provides an efficient PyTorch-based library for training deep models.

Solving reinforcement learning tasks which require language and vision

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

An experimental technique for efficiently exploring neural architectures.

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

A certifiable defense against adversarial examples by training neural networks to be provably robust

Contains code for the paper "Vision Transformers are Robust Learners".

Repository for tackling Kaggle Ultrasound Nerve Segmentation challenge using Torchnet.

Model Zoo of BDD100K Dataset

Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

Source Code For Template-Based Named Entity Recognition Using BART

Implementation of UNet on the Joey ML framework

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features