[TensorFlow 2] Attention is all you need (Transformer)

TensorFlow implementation of "Attention is all you need (Transformer)"

Dataset

The MNIST dataset is used for confirming the working of the transformer.
The dataset is processed as follows for regarding as a sequential form.

Trim off the sides from the square image.
- (H X W) -> (H X W_trim)
  - H (Height) = W (Width) = 28
  - W_trim = 18
- The height axis is regarded as a sequence and the width axis is regarded as a feature of each sequence.
  - (H X W) = (S X F)
  - S (Sequence) = 28
  - F (Feature) = 18
Specify the target Y as an inverse sequence of X to differentiate the input sequence from the target sequence.
- In the figure, the data is shown in an upside-down form.

Results

Training

Generation

Class	Attention Map	Reconstruction
0
1
2
3
4
5
6
7
8
9

Requirements

Tensorflow 2.4.0
whiteboxlayer 0.2.1

Reference

[1] Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems. 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
neuralnet		neuralnet
source		source
LICENSE		LICENSE
README.md		README.md
mnist_ex.pkl		mnist_ex.pkl
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

neuralnet

neuralnet

source

source

LICENSE

LICENSE

README.md

README.md

mnist_ex.pkl

mnist_ex.pkl

run.py

run.py

Repository files navigation

[TensorFlow 2] Attention is all you need (Transformer)

Dataset

Results

Training

Generation

Requirements

Reference

About

Releases

Packages

Languages

License

YeongHyeon/Transformer-TF2

Folders and files

Latest commit

History

Repository files navigation

[TensorFlow 2] Attention is all you need (Transformer)

Dataset

Results

Training

Generation

Requirements

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages