PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

Last update: Jan 27, 2022

Related tags

Deep Learning tupe

Overview

TUPE

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/tupe.git

Navigate to the cloned directory. You can use the bare-bone TUPE Encoder model via

>>> import torch; from tupe import TUPEConfig, TUPEEncoder
>>> config  = TUPEConfig()
>>> model = TUPEEncoder(config)
>>> x = torch.randn(8, 100, 128)
>>> model(x).shape
torch.Size([8, 100, 128])

By default, the model comes with the following parameters:

TUPEConfig(
    num_layers=6, 
    num_heads=8, 
    d_model=128, 
    d_head=16, 
    max_len=256, 
    dropout=0.1, 
    expansion_factor=1, 
    relative_bias=True, 
    bidirectional_bias=True, 
    num_buckets=32, 
    max_distance=128
)

Abstract

In this work, we investigate the positional encoding methods used in language pre- training (e.g., BERT) and identify several problems in the existing formulations. First, we show that in the absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations between the two heterogeneous information resources. It may bring unnecessary randomness in the attention and further limit the expressiveness of the model. Sec- ond, we question whether treating the position of the symbol [CLS] the same as other words is a reasonable design, considering its special role (the representation of the entire sentence) in the downstream tasks. Motivated from above analysis, we propose a new positional encoding method called Transformer with Untied Positional Encoding (TUPE). In the self-attention module, TUPE computes the word contextual correlation and positional correlation separately with different parameterizations and then adds them together. This design removes the mixed and noisy correlations over heterogeneous embeddings and offers more expres- siveness by using different projection matrices. Furthermore, TUPE unties the [CLS] symbol from other positions, making it easier to capture information from all positions. Extensive experiments and ablation studies on GLUE benchmark demonstrate the effectiveness of the proposed method.

Implementation Notes

The default configuration follows TUPE-R, which includes T5's relative position bias. To use TUPE-A, simply toggle TUPEConfig.relative_bias field to False.
To avoid limiting the use case of this architecture to BERT-type models with [CLS] tokens, this implementation purposefully omits Section 3.2, on untying the [CLS] symbol from positions.

Citation

@inproceedings{ke2021rethinking,
	title        = {Rethinking Positional Encoding in Language Pre-training},
	author       = {Guolin Ke and Di He and Tie-Yan Liu},
	year         = 2021,
	booktitle    = {International Conference on Learning Representations},
	url          = {https://openreview.net/forum?id=09-528y2Fgf}
}

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

Related tags

Overview

TUPE

Quickstart

Abstract

Implementation Notes

Citation

Owner

Jake Tae

Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

This repository lets you interact with Lean through a REPL.

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

Free like Freedom

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

Rendering Point Clouds with Compute Shaders

First-Order Probabilistic Programming Language

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation