Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Last update: Sep 06, 2021

Overview

Period-alternatives-of-Softmax

Experimental Demo for our paper

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

We suggest that replacing the exponential function by periodic functions. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.

** Create your own 'dataset' fold, and maybe need to modify the demo.py file for your own dataset except for cifar-10, cifar-100 and Tiny-imageNet.

Function available:

softmax , norm_softmax
sinmax, norm_sinmax
cosmax, norm_cosmax
sin_2_max, norm_sin_2_max
sin_2_max_move, norm_sin_2_max_move
sirenmax, norm_sirenmax
sin_softmax, norm_sin_softmax

mode available:

search:
        Random search for a suitable set of learning rate and weight decay, and record the results in 
        Attention_test/*functions/lr_wd_search.txt
run:
        Train the demo, and there will be four .npy files created in root.
        (1) 'record_val_acc.npy' for val acc record every 100 iter;
        (2) 'record_train_acc.npy' for train acc record every batch;
        (3) 'record_loss.npy' for train loss record every batch;
        (4) 'kq_value.npy' for Q.K record *before sclaled*.
att_run:
        Same as the run mode but:
        (1) No kq_value record;
        (2) Every 5 epoch, input a test image and record the attention score map of each head of each layer.
            Saved in 'Attention_test/attention_maps.npy'

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Related tags

Overview

Period-alternatives-of-Softmax

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

Function available:

mode available:

Owner

slwang9353

Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

Data-depth-inference - Data depth inference with python

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

NR-GAN: Noise Robust Generative Adversarial Networks

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

Repository For Programmers Seeking a platform to show their skills

Source for the paper "Universal Activation Function for machine learning"

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

Python package for Bayesian Machine Learning with scikit-learn API

A machine learning project which can detect and predict the skin disease through image recognition.

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

Neural Module Network for VQA in Pytorch

Feedback is important: response-aware feedback mechanism for background based conversation

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

Diverse Object-Scene Compositions For Zero-Shot Action Recognition