Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Overview

ONNX T5 Actions Status Actions Status Version Downloads Slack

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX.

This package is still in alpha stage, therefore some functionalities such as beam searches are still in development.

Installation

ONNX-T5 is available on PyPi.

pip install onnxt5

For the dev version you can run the following.

git clone https://github.com/abelriboulot/onnxt5
cd onnxt5
pip install -e .

Usage

The simplest way to get started for generation is to use the default pre-trained version of T5 on ONNX included in the package.

NOTE: Please note that the first time you call get_encoder_decoder_tokenizer, the models are being downloaded which might take a minute or two.

from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
# output_text: "J'ai été victime d'une série d'accidents."

Other tasks just require to change the prefix in your prompt, for instance for summarization:

prompt = 'summarize: <PARAGRAPH>'
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)

If you want to get the embeddings of text, you can run the following

from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
prompt = 'Listen, Billy Pilgrim has come unstuck in time.'
encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)

ONNXT5 also lets you export and use your own models. See the examples\ folder for more detailed examples.

T5 works with tokens such as summarize:, translate English to German:, or question: ... context:. You can see a list of the pretrained tasks and token in the appendix D of the original paper.

Functionalities

  • Run any of the T5 trained tasks in a line (translation, summarization, sentiment analysis, completion, generation)
  • Export your own T5 models to ONNX easily
  • Utility functions to generate what you need quickly
  • Up to 4X speedup compared to PyTorch execution for smaller contexts

Benchmarks

The outperformance varies heavily based on the length of the context. For contexts less than ~500 words, ONNX outperforms greatly, going up to a 4X speedup compared to PyTorch. However, the longer the context, the smaller the speedup of ONNX, with Pytorch being faster above 500 words.

GPU Benchmark, Embedding Task

Benchmark Embedding

GPU Benchmark, Generation Task

Benchmark Generation

Contributing

The project is still in its infancy, so I would love your feedback, to know what problems you are trying to solve, hear issues you're encountering, and discuss features that would help you. Therefore feel free to shoot me an e-mail (see my profile for the address!) or join our slack community.

Acknowledgements

This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.

Original T5 Paper

@article{2019t5,
  author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {arXiv e-prints},
  year = {2019},
  archivePrefix = {arXiv},
  eprint = {1910.10683},
}

Microsoft onnxruntime repo

HuggingFace implementation of T5

Comments
  •  Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.

    Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.

    Hi there, I've run a guide code and it doesn't work. image I'm getting an error on the following line, decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

    image text is a text from Wikipedia about cars.

    onnxt5==0.1.4 protobuf==3.6.0 python==3.7

    opened by vladislavkoz 6
  • Default T5 summary contains <extra_id_2>.<extra_id_3>.<extra_id_4>

    Default T5 summary contains ..

    <extra_id_0> the company<extra_id_1> the company<extra_id_2>.<extra_id_3>.<extra_id_4>.<extra_id_5>.<extra_id_6>. <extra_id_7>.

    Do I need some postprocessing? Or it is an issue?

    opened by vladislavkoz 5
  • int() argument must be a string , when running exemple.

    int() argument must be a string , when running exemple.

    Hello , i can't run the first exemple ,

    from onnxt5 import GenerativeT5
    from onnxt5.api import get_encoder_decoder_tokenizer
    
    decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
    generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
    prompt = 'translate English to French: I was a victim of a series of accidents.'
    
    output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
     # output_text: "J'ai été victime d'une série d'accidents." 
    

    the model begin calculation but before End, i have this error :

    TypeError                                 Traceback (most recent call last)
    <ipython-input-1-257f12b63043> in <module>
          5 prompt = 'translate English to French: I was a victim of a series of accidents.'
          6 
    ----> 7 output_text, output_logits = generative_t5(prompt, max_length=16, temperature=0.)
          8 # output_text: "J'ai été victime d'une série d'accidents."
    
    ~\Anaconda3\envs\onnxt5\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
        720             result = self._slow_forward(*input, **kwargs)
        721         else:
    --> 722             result = self.forward(*input, **kwargs)
        723         for hook in itertools.chain(
        724                 _global_forward_hooks.values(),
    
    ~\Anaconda3\envs\onnxt5\lib\site-packages\onnxt5\models.py in forward(self, prompt, max_length, temperature, repetition_penalty, top_k, top_p, max_context_length)
        145                 new_tokens.append(next_token)
        146 
    --> 147             return self.tokenizer.decode(new_tokens), new_logits
    
    ~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils_base.py in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
       3000             skip_special_tokens=skip_special_tokens,
       3001             clean_up_tokenization_spaces=clean_up_tokenization_spaces,
    -> 3002             **kwargs,
       3003         )
       3004 
    
    ~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens)
        730         spaces_between_special_tokens: bool = True,
        731     ) -> str:
    --> 732         filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
        733 
        734         # To avoid mixing byte-level and unicode for byte-level BPT
    
    ~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in convert_ids_to_tokens(self, ids, skip_special_tokens)
        708         tokens = []
        709         for index in ids:
    --> 710             index = int(index)
        711             if skip_special_tokens and index in self.all_special_ids:
        712                 continue
    
    TypeError: int() argument must be a string, a bytes-like object or a number, not 'list
    

    `

    and i have no idea how to find solution , if you have any solution !? thx !

    opened by AZE38 3
  • Inference time on gpu vs onnxt5-gpu

    Inference time on gpu vs onnxt5-gpu

    @abelriboulot , @Ki6an , @brymck .
    I have finetuned t5 model for paraphrasing task like this: Paraphrase with t5

    I want to reduce inference time, so I exported finetuned t5 model using onnxt5, here I get time taken more in case where I use onnx model on gpu than pytorch model on gpu.

    gpu: time taken = 0.2357314471155405 time taken = 0.24958523781970143 time taken = 0.20342689706012607 time taken = 0.5490081580355763 time taken = 0.10756197292357683

    onnxt5-gpu time taken = 0.5277913622558117 time taken = 0.6335883080027997 time taken = 0.6975196991115808 time taken = 1.9159171842038631 time taken = 0.7938353712670505

    Did I make mistake in exporting/loading model ? gpu code onnxt5-gpu code

    opened by priyanksonis 1
  • Add progress bar

    Add progress bar

    This adds a progress bar using tqdm.

    The files this library downloads are about 500 MB in size, so I'd like to have some feedback on what's happening. Originally I wasn't clear what was the cause of the delay when running get_encoder_decoder_tokenizer.

    opened by brymck 0
  • Add download progress bar

    Add download progress bar

    This adds a progress bar using tqdm.

    The files this library downloads are about 500 MB in size, so I'd like to have some feedback on what's happening. Originally I wasn't clear what was the cause of the delay when running get_encoder_decoder_tokenizer.

    opened by brymck 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Add dtype to new_tokens tensor to avoid an error when decoding

    Add dtype to new_tokens tensor to avoid an error when decoding

    Thanks for the repo!

    I was having an error message come up when running the code after my initial install.

    Small code example:

    import os
    
    import torch
    from onnxt5 import GenerativeT5
    from onnxt5.api import get_sess
    from transformers import AutoTokenizer
    
    model_dir = <path-to-tokenizer-and-onnx-files>
    model_name = <name-of-model>
    
    tokenizer = AutoTokenizer.from_pretrained(
        model_dir,
    )
    
    decoder_sess, encoder_sess = get_sess(
        os.path.join(model_dir, model_name)
    )
    
    model = GenerativeT5(
        encoder_sess,
        decoder_sess,
        tokenizer,
        onnx=True,
        cuda=torch.cuda.is_available(),
    )
    
    sentences = [
        "I has good grammar.",
        "I have bettr grammur."
    ]
    
    corrected_sentences = [
        model(f"grammar: {sentence}",
              max_length=512,
              temperature=1,
              )[0]
        for sentence in sentences
    ]
    
    
    

    The error

    Traceback (most recent call last):
      File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 133, in <module>
        main()
      File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 125, in main
        prediction_output = predict_fn(input_data=input_tokens,
      File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 95, in predict_fn
        corrected_sentences = [model(f"grammar: {sentence}",
      File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 95, in <listcomp>
        corrected_sentences = [model(f"grammar: {sentence}",
      File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/onnxt5/onnxt5/models.py", line 154, in forward
        return self.tokenizer.decode(new_tokens), new_logits
      File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3367, in decode
        return self._decode(
      File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 548, in _decode
        text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
    TypeError: 'float' object cannot be interpreted as an integer
    

    It seems the tensor for new tokens is of type float instead of long. Adding dtype=torch.long to the instantiation of the tensor resolved my issue, so I thought I'd share.

    opened by jambran 0
  • Running example

    Running example "export_pretrained_model.py" as-is fails (See details)

    86%|████████▌ | 18/21 [00:00<00:00, 44.29it/s]
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-4-f543e3365977> in <module>()
         27 # Generating text
         28 generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
    ---> 29 generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]
    
    3 frames
    /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
        505         if isinstance(token_ids, int):
        506             token_ids = [token_ids]
    --> 507         text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
        508 
        509         if clean_up_tokenization_spaces:
    
    TypeError: 'float' object cannot be interpreted as an integer
    

    Any possible version conflicts that you know of?

    opened by PrithivirajDamodaran 2
  • How to suppress output

    How to suppress output

    How to suppress output? Setting verbosity logging level does nothing 5%|█████████▊ | 16/300 [00:01<00:18, 15.65it/s]

    opened by 127 0
  • Can this model suitable for multilingual-t5 accelerate?

    Can this model suitable for multilingual-t5 accelerate?

    Recently, I use the chinese function of multilingual-t5 model to accomplish the Chinese NLG tasks. However, the inference speed might be slow, could this model be used for multilingual-t5? How can I do?

    opened by williamwong91 2
Releases(0.1.9)
Owner
Abel
Repentant portfolio manager, turned data scientist. I'm one Vonnegut quote away from figuring out this whole life thing.
Abel
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
Transformers and related deep network architectures are summarized and implemented here.

Transformers: from NLP to CV This is a practical introduction to Transformers from Natural Language Processing (NLP) to Computer Vision (CV) Introduct

Ibrahim Sobh 138 Dec 27, 2022
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

BP-Transformer This repo contains the code for our paper BP-Transformer: Modeling Long-Range Context via Binary Partition Zihao Ye, Qipeng Guo, Quan G

Zihao Ye 119 Nov 14, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 39 Dec 14, 2022
Codename generator using WordNet parts of speech database

codenames Codename generator using WordNet parts of speech database References: https://possiblywrong.wordpress.com/2021/09/13/code-name-generator/ ht

possiblywrong 27 Oct 30, 2022
KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

KoBERTopic 모델 소개 KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정했습니다. 기존 BERTopic : https://github.com/MaartenGr/BERTopic/tree/05a6790b21009d

Won Joon Yoo 26 Jan 03, 2023
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
This is a MD5 password/passphrase brute force tool

CROWES-PASS-CRACK-TOOl This is a MD5 password/passphrase brute force tool How to install: Do 'git clone https://github.com/CROW31/CROWES-PASS-CRACK-TO

9 Mar 02, 2022
A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Vladimir Larin 7 Nov 15, 2022
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

Dat Quoc Nguyen 152 Sep 02, 2022
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

Genta Indra Winata 45 Nov 21, 2022
Code-autocomplete, a code completion plugin for Python

Code AutoComplete code-autocomplete, a code completion plugin for Python.

xuming 13 Jan 07, 2023
Entity Disambiguation as text extraction (ACL 2022)

ExtEnD: Extractive Entity Disambiguation This repository contains the code of ExtEnD: Extractive Entity Disambiguation, a novel approach to Entity Dis

Sapienza NLP group 121 Jan 03, 2023
Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded

Aditya 128 Dec 29, 2022
NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Project 3: Web APIs & NLP Problem Statement How do r/Libertarian and r/Neoliberal differ on Biden post-inaguration? The goal of the project is to see

Adam Muhammad Klesc 2 Mar 29, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

Raj Dabre 121 Jan 05, 2023