Trained T5 and T5-large model for creating keywords from text

Last update: Nov 24, 2022

Overview

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

pip install transformers sentencepiece

from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X	Y
Some text that is fed to the input	The text that should come out
Some text that is fed to the input	The text that should come out

Go to the training notebook and learn more about it:

Trained T5 and T5-large model for creating keywords from text

Related tags

Overview

text to keywords

Usage

Training

Owner

Danil

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

A collection of GNN-based fake news detection models.

Python library for Serbian Natural language processing (NLP)

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Unsupervised Language Modeling at scale for robust sentiment classification

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

a CTF web challenge about making screenshots

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Wind Speed Prediction using LSTMs in PyTorch

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Weakly-supervised Text Classification Based on Keyword Graph

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

AI_Assistant - This is a Python based Voice Assistant.

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

State of the art faster Natural Language Processing in Tensorflow 2.0 .