Flaxformer: transformer architectures in JAX/Flax

Overview

Flaxformer: transformer architectures in JAX/Flax

Flaxformer is a transformer library for primarily NLP and multimodal research at Google. It is used for many NLP research use cases, providing both off-the-shelf BERT and T5 models, and several research projects built on shared components.

General library goals

The Flaxformer library aims to provide transformer models that are:

  • High performance: Models are annotated for use with the PJIT API, enabling them to be used for training the largest models.
  • Reusable: Components have self-contained configuration, and high-level modules like encoders, decoders, etc. don't make too many assumptions about what their sub-modules look like.
  • Tested: We aim to employ a reasonable amount of unit testing, and write tests whenever bugs are encountered. However no guarantees are provided.
  • Maintainble: We have created a versioning strategy for our modules so code refactors can take place which alter the module structure. This is tricky in Flax, because Flax generates a tree of parameters based on the exact module structure. Our approach lets us maintain compatibility with previously trained model checkpoints.

Code locations

Modeling components such as dense attention, layer norms, and MLP blocks can be found in the components/ directory.

Higher-level classes which combine these components can be found in the architectures/ directory. The current architecture file for the T5 family of models is architectures/t5/t5_architecture.py; this is a mid-level API requiring sub-components to be configured. A high-level starting point, exposing fewer parameters, is architectures/t5/t5_1_1.py.

Relationship to other codebases

Flaxformer is primarily used by other research projects, in particular T5X. We hope to release examples demonstrating the integration of these codebases soon.

If you would like to use Flaxformer independently of T5X, please see the unit tests for examples instantiating the models. In the medium-term future, we hope to provide more stand-alone examples of Flaxformer use.

Contributions

Unfortunately, we cannot accept contributions to the Flaxformer repo at this time, so any pull requests will be automatically closed - but please file issues as needed!

Installing dependencies and running tests

After checking out this repository, in its root directory, you can install it along with test dependencies by running,

pip3 install '.[testing]'

If you like, you can run the tests from pytest with the following invocation,

python3 -m pytest

Uninstalling

If you need to uninstall Flaxformer, please run,

pip3 uninstall flaxformer

Troubleshooting

Flax deps

Flaxformer is developed in close collaboration with the Flax team. There may be bugs if your Flax version is not up to date. To install the latest version from GitHub, please run,

pip3 uninstall flax
pip3 install git+https://github.com/google/flax

Note

Flaxformer is a project maintained by a team in Google Research. It is not an official Google product.

Owner
Google
Google ❤️ Open Source
Google
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022
Chinese Named Entity Recognization (BiLSTM with PyTorch)

BiLSTM-CRF for Name Entity Recognition PyTorch version A PyTorch implemention of Bi-LSTM-CRF model for Chinese Named Entity Recognition. 使用 PyTorch 实现

5 Jun 01, 2022
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Maksim Terpilowski 49 Dec 30, 2022
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

tfds-korean A collection of Korean Text Datasets ready to use using Tensorflow-Datasets. TensorFlow-Datasets를 이용한 한국어/한글 데이터셋 모음입니다. Dataset Catalog |

Jeong Ukjae 20 Jul 11, 2022
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

SHI Lab 367 Dec 31, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

source code for NeurIPS21 paper robabilistic Margins for Instance Reweighting in Adversarial Training

9 Dec 20, 2022
The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation .

Qian Wang 21 Dec 17, 2022
An assignment on creating a minimalist neural network toolkit for CS11-747

minnn by Graham Neubig, Zhisong Zhang, and Divyansh Kaushik This is an exercise in developing a minimalist neural network toolkit for NLP, part of Car

Graham Neubig 63 Dec 29, 2022
Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

Rohit Govindan 1 Jan 27, 2022
Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Training-code-of-STM This repository fully reproduces Space-Time Memory Networks Performance on Davis17 val set&Weights backbone training stage traini

haochen wang 128 Dec 11, 2022
Code Generation using a large neural network called GPT-J

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

DeepGenX 389 Dec 31, 2022
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

Victor Dibia 220 Dec 11, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.6k Dec 27, 2022
OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

How To Use killtheZoom-2.0 Windows 0. https://joyhong.tistory.com/79 이 글을 보면서 tesseract를 C:\Program Files\Tesseract-OCR 경로로 설치해주세요(한국어 언어 추가 필요) 상단의 초

김정인 9 Sep 13, 2021
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 37 Jan 04, 2023
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 01, 2023
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022