Mesh Transformer Jax

A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers.

See enwik8_example.py for an example of using this to implement an autoregressive language model.

Benchmarks

On a TPU v3-8 (see tpuv38_example.py):

~2.7B model

Initialized in 121.842s
Total parameters: 2722382080
Compiled in 49.0534s
it: 0, loss: 20.311113357543945
<snip>
it: 90, loss: 3.987450361251831
100 steps in 109.385s
effective flops (not including attn): 2.4466e+14

~4.8B model

Initialized in 101.016s
Total parameters: 4836720896
Compiled in 52.7404s
it: 0, loss: 4.632925987243652
<snip>
it: 40, loss: 3.2406811714172363
50 steps in 102.559s
effective flops (not including attn): 2.31803e+14

10B model

Initialized in 152.762s
Total parameters: 10073579776
Compiled in 92.6539s
it: 0, loss: 5.3125
<snip>
it: 40, loss: 3.65625
50 steps in 100.235s
effective flops (not including attn): 2.46988e+14

Model parallel transformers in Jax and Haiku

Related tags

Overview

Mesh Transformer Jax

Benchmarks

~2.7B model

~4.8B model

10B model

TODO

Owner

Ben Wang

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

PyTorch Implementation of Backbone of PicoDet

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

A Number Recognition algorithm

This repository contains the source code for the paper Tutorial on amortized optimization for learning to optimize over continuous domains by Brandon Amos

Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

A Python package for causal inference using Synthetic Controls

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Codes for paper "KNAS: Green Neural Architecture Search"

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

The second project in Python course on FCC

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Benchmarks for the Optimal Power Flow Problem

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)