Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Overview

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

When I opened this repository in 2017, there was no official code yet. I tried to implement the paper as I understood, but to no surprise it had several bugs. I realized them mostly thanks to people who issued here, so I'm very grateful to all of them. Though there is the official implementation as well as several other unofficial github repos, I decided to update my own one. This update focuses on:

  • readable / understandable code writing
  • modularization (but not too much)
  • revising known bugs. (masking, positional encoding, ...)
  • updating to TF1.12. (tf.data, ...)
  • adding some missing components (bpe, shared weight matrix, ...)
  • including useful comments in the code.

I still stick to IWSLT 2016 de-en. I guess if you'd like to test on a big data such as WMT, you would rely on the official implementation. After all, it's pleasant to check quickly if your model works. The initial code for TF1.2 is moved to the tf1.2_lecacy folder for the record.

Requirements

  • python==3.x (Let's move on to python 3 if you still use python 2)
  • tensorflow==1.12.0
  • numpy>=1.15.4
  • sentencepiece==0.1.8
  • tqdm>=4.28.1

Training

bash download.sh

It should be extracted to iwslt2016/de-en folder automatically.

  • STEP 2. Run the command below to create preprocessed train/eval/test data.
python prepro.py

If you want to change the vocabulary size (default:32000), do this.

python prepro.py --vocab_size 8000

It should create two folders iwslt2016/prepro and iwslt2016/segmented.

  • STEP 3. Run the following command.
python train.py

Check hparams.py to see which parameters are possible. For example,

python train.py --logdir myLog --batch_size 256 --dropout_rate 0.5
  • STEP 3. Or download the pretrained models.
wget https://dl.dropbox.com/s/4lom1czy5xfzr4q/log.zip; unzip log.zip; rm log.zip

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

  • Run
python test.py --ckpt log/1/iwslt2016_E19L2.64-29146 (OR yourCkptFile OR yourCkptFileDirectory)

Results

  • Typically, machine translation is evaluated with Bleu score.
  • All evaluation results are available in eval/1 and test/1.
tst2013 (dev) tst2014 (test)
28.06 23.88

Notes

  • Beam decoding will be added soon.
  • I'm going to update the code when TF2.0 comes out if possible.
Owner
Kyubyong Park
Lives in Seoul, Korea. Studied Linguistics at SNU and Univ. of Hawaii.
Kyubyong Park
Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

Niek Zhen 3 Jan 05, 2022
基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成 说明 config.py里面含有训练、测试、预测的参数,更改后运行: python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

西西嘛呦 3 May 26, 2022
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
Machine Psychology: Python Generated Art

Machine Psychology: Python Generated Art A limited collection of 64 algorithmically generated artwork. Each unique piece is then given a title by the

Pixegami Team 67 Dec 13, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 01, 2023
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro

4 Oct 15, 2022
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ Getting started Prerequ

Cambridge Quantum 315 Jan 01, 2023
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Bloxflip Smart Bet A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode. https://bloxflip.com/crash. THIS

43 Jan 05, 2023
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
The guide to tackle with the Text Summarization

The guide to tackle with the Text Summarization

Takahiro Kubo 1.2k Dec 30, 2022
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Won Joon Yoo 335 Jan 04, 2023
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022