Implementation of a Transformer, but completely in Triton

Last update: Dec 22, 2022

Overview

Transformer in Triton (wip)

Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repository will mostly be a learning experience, with the end-goal being a vanilla transformer that is faster and more efficient to train.

Install

$ pip install triton-transformer

Usage

import torch
from triton_transformer import Transformer

model = Transformer(
    num_tokens = 256,
    max_seq_len = 1024,
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64
)

x = torch.randint(0, 256, (1, 1024))
mask = torch.ones(1, 1024).bool()

logits = model(x, mask = mask) # (1, 1024, 256)

Citations

@article{Tillet2019TritonAI,
    title   = {Triton: an intermediate language and compiler for tiled neural network computations},
    author  = {Philippe Tillet and H. Kung and D. Cox},
    journal = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages},
    year    = {2019}
}

@misc{vaswani2017attention,
    title   = {Attention Is All You Need}, 
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

515 Dec 26, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

115 Dec 9, 2021

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

409 Jan 6, 2023

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

610 Dec 28, 2022

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

Comments

Question concerning PyTorch build

Hello. I find your project very interesting and I have seen your comparison between PyTorch and Triton implementations.

However, I am curious whether your PyTorch environment is a source build optimized for your machine or a pip/conda install.

Source building has faster runtimes and if a conda install is being used for comparison, the difference in speed may simply be due to Triton optimizing CUDA for the run environment.

Thank you again for your interesting project.

opened by veritas9872 13
_layernorm implementation forward result not equal F.layer_norm

I have a try on your triton-transformer and test the layernorm module alone. It's very weird that the forward result is different while the backward result is equal.

code: from triton_transformer.layernorm import layernorm import torch import torch.nn as nn

torch.manual_seed(0) x = torch.randn(2,5).cuda() x.requires_grad_(True) dy = .1*torch.randn_like(x).cuda() dim = 5 norm = nn.LayerNorm(dim).cuda()

y1 = layernorm(x, norm.weight, norm.bias, use_triton = True) y2 = layernorm(x, norm.weight, norm.bias, use_triton = False) print(y1, y2) print(torch.allclose(y1, y2))

y1.backward(dy, retain_graph=True) dx_y1 = x.grad.clone()

x.grad = None

y2.backward(dy, retain_graph=True) dx_y2 = x.grad.clone() print(dx_y1, dx_y2) print(torch.allclose(dx_y1, dx_y2))

result: `tensor([[ 0.9492, -0.0021, -0.9797, 0.4449, -0.4123], [-0.7624, 0.4399, 0.7299, -0.3091, -0.0983]], device='cuda:0', grad_fn=<_layernormBackward>) tensor([[ 1.4217, -0.0031, -1.4674, 0.6663, -0.6175], [-1.4342, 0.8276, 1.3732, -0.5815, -0.1850]], device='cuda:0', grad_fn=) False

tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') True`

opened by Tengxu-Sun 1
Current state of benchmarking & contributing?
Hey @lucidrains - hope you're doing well! I have some time to hack the next couple weeks, just wanted to get a sense of:

Current state of benchmarking (what Triton kernels provide how much lift, aggregate lift over a "vanilla Transformer implementation"

If there's anything I could help with, especially as I learn Triton!
opened by siddk 0
Official layer norm added

Hi @lucidrains , in Triton layer norm was just added in examples, https://github.com/openai/triton/commit/d4baad426db72b83c5222e1c83c929c1860cae54 I tested it, it's twice as fast as Torch, often faster then Apex.

I'm looking forward for your implementation of attention, so far the Torch implementation is the fastest with 12.3 / 14.5 (forw / back) vs the other Triton implementation in DeepSpeed which is 17.3/ 23.0 on my data.

opened by olegklimov 2

Releases(0.1.1)

0.1.1(Apr 5, 2022)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 4, 2022)

Source code(tar.gz)
Source code(zip)
0.0.28(Mar 23, 2022)

Source code(tar.gz)
Source code(zip)
0.0.27(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.26(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.25(Oct 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.24(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.23(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.22(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.21(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.20(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.19(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.18(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.17(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.16(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.15(Sep 27, 2021)

Source code(tar.gz)
Source code(zip)
0.0.14(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.12(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

HEP Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior Implementation Python3 PyTorch=1.0 NVIDIA GPU+CUDA Training process The

34 Dec 04, 2022

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Unsupervised Attributed Multiplex Network Embedding (DMGI) Overview Nodes in a multiplex network are connected by multiple types of relations. However

114 Dec 06, 2022

Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Place holder for dat

1 Apr 25, 2022

Collection of machine learning related notebooks to share.

ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori

14 Dec 22, 2022

Implementation of Nalbach et al. 2017 paper.

Deep Shading Convolutional Neural Networks for Screen-Space Shading Our project is based on Nalbach et al. 2017 paper. In this project, a set of buffe

17 Sep 08, 2022

Sample and Computation Redistribution for Efficient Face Detection

Introduction SCRFD is an efficient high accuracy face detection approach which initially described in Arxiv. Performance Precision, flops and infer ti

13 Mar 05, 2022

Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

2 Aug 22, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

CapsuleVOS This is the code for the ICCV 2019 paper CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing. Arxiv Link: https://a

53 Oct 27, 2022

PyTorch Implementation of SSTNs for hyperspectral image classifications from the IEEE T-GRS paper "Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework."

PyTorch Implementation of SSTN for Hyperspectral Image Classification Paper links: SSTN published on IEEE T-GRS. Also, you can directly find the imple

54 Dec 19, 2022

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

Double Cube Engravings This script creates a dataset for multi-label mesh clasification, with an intentionally difficult setup for point cloud classif

1 Nov 30, 2021

This repository contains the code for: RerrFact model for SciVer shared task

RerrFact This repository contains the code for: RerrFact model for SciVer shared task. Setup for Inference 1. Download SciFact database Download the S

1 May 22, 2022

Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

12.2k Jan 03, 2023

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

31 Nov 22, 2022

Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

173 Dec 25, 2022

Implementation of a Transformer, but completely in Triton

Related tags

Overview

Transformer in Triton (wip)

Install

Usage

Citations

You might also like...

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

A concise but complete implementation of CLIP with various experimental improvements from recent papers

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

3D-Transformer: Molecular Representation with Transformer in 3D Space

Comments

Question concerning PyTorch build

_layernorm implementation forward result not equal F.layer_norm

Current state of benchmarking & contributing?

Official layer norm added

Releases(0.1.1)

0.1.1(Apr 5, 2022)

0.1.0(Apr 4, 2022)

0.0.28(Mar 23, 2022)

0.0.27(Nov 6, 2021)

0.0.26(Nov 6, 2021)

0.0.25(Oct 6, 2021)

0.0.24(Oct 4, 2021)

0.0.23(Oct 4, 2021)

0.0.22(Oct 4, 2021)

0.0.21(Oct 4, 2021)

0.0.20(Sep 29, 2021)

0.0.19(Sep 29, 2021)

0.0.18(Sep 29, 2021)

0.0.17(Sep 28, 2021)

0.0.16(Sep 28, 2021)

0.0.15(Sep 27, 2021)

0.0.14(Sep 23, 2021)

0.0.12(Sep 23, 2021)

0.0.10(Sep 23, 2021)

0.0.9(Sep 22, 2021)

0.0.8(Sep 22, 2021)

0.0.7(Sep 22, 2021)

0.0.6(Sep 22, 2021)

0.0.5(Sep 22, 2021)

0.0.4(Sep 22, 2021)

0.0.3(Sep 15, 2021)

0.0.2(Sep 15, 2021)

Owner

Phil Wang

Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

Collection of machine learning related notebooks to share.

Implementation of Nalbach et al. 2017 paper.

Sample and Computation Redistribution for Efficient Face Detection

Cards Against Humanity AI

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

PyTorch Implementation of SSTNs for hyperspectral image classifications from the IEEE T-GRS paper "Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework."

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

This repository contains the code for: RerrFact model for SciVer shared task

Predictive AI layer for existing databases.

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Rule based classification A hotel s customers dataset

Multi-task head pose estimation in-the-wild

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

Code & Data for Enhancing Photorealism Enhancement

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.