Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

Owner

Goro Kobayashi

Code of the lileonardo team for the 2021 Emotion and Theme Recognition in Music task of MediaEval 2021

Towards Fine-Grained Reasoning for Fake News Detection

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

An Implementation of SiameseRPN with Feature Pyramid Networks

High-fidelity 3D Model Compression based on Key Spheres

DEMix Layers for Modular Language Modeling

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Open-source Monocular Python HawkEye for Tennis

UFT - Universal File Transfer With Python

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

wmctrl ported to Python Ctypes

Adversarial Graph Augmentation to Improve Graph Contrastive Learning

Parasite: a tool allowing you to compress and decompress files, to reduce their size

sense-py-AnishaBaishya created by GitHub Classroom

Pytorch implementation of our method for regularizing nerual radiance fields for few-shot neural volume rendering.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset