A Transformer Implementation that is easy to understand and customizable.

Last update: Jan 20, 2022

Overview

Simple Transformer

I've written a series of articles on the transformer architecture and language models on Medium.

This repository contains an implementation of the Transformer architecture presented in the paper Attention Is All You Need by Ashish Vaswani, et. al.

My goal is to write an implementation that is easy to understand and dig into nitty-gritty details where the devil is.

Python environment

You can use any Python virtual environment like venv and conda.

For example, with venv:

python3 -m venv venv
source venv/bin/activate

pip install --upgrade pip
pip install -e.

Spacy Tokenizer Data Preparation

To use Spacy's tokenizer, make sure to download required languages.

For example, English and Germany tokenizers can be downloaded as below:

python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm

Text Data from Torchtext

This project uses text datasets from Torchtext.

from torchtext import datasets

The default configuration uses Multi30k dataset.

Training

python train.py config_path

The default config path is config/config.yaml.

It is possible to resume training from a checkpoint.

python train.py --checkpoint_path runs/20220108-164720-Multi30k-Transformer/checkpoint-010-2.3343.pt

You can run tensorboard to see the training progress.

tensorboard --logdir=runs

The logs are created under runs.

Test

python test.py checkpoint_path

Example,

python test.py runs/20220108-164720-Multi30k-Transformer/checkpoint-010-2.3343.pt

config.yaml is copied to the model folder when training starts, and the test.py assumes the existence of a config yaml file.

Unit tests

There are some unit tests in the tests folder.

pytest tests

A Transformer Implementation that is easy to understand and customizable.

Related tags

Overview

Simple Transformer

Python environment

Spacy Tokenizer Data Preparation

Text Data from Torchtext

Training

Test

Unit tests

References:

Owner

Naoki Shibuya

Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

Automatically search Stack Overflow for the command you want to run

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

A Paper List for Speech Translation

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

A flask application to predict the speech emotion of any .wav file.

Gold standard corpus annotated with verb-preverb connections for Hungarian.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

This repository has a implementations of data augmentation for NLP for Japanese.

GPT-3: Language Models are Few-Shot Learners

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Contract Understanding Atticus Dataset

Code Generation using a large neural network called GPT-J

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

It analyze the sentiment of the user, whether it is postive or negative.

SimCTG - A Contrastive Framework for Neural Text Generation