Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Last update: Jan 03, 2023

Overview

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

Comments

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none) ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?
opened by abhijithneilabraham 6
Add finetuning model to keytotext

Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus
enhancement good first issue

opened by gagan3012 2
"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

opened by drscotthawley 2
Add Citations

Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by gagan3012 1
Adding new models to keytotext

Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.
enhancement good first issue

opened by gagan3012 1
Inference API for Keytotext

Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

Describe the solution you'd like Inference API
enhancement good first issue

opened by gagan3012 1
Create Better UI

Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

Describe the solution you'd like Better UI with a nicer design
enhancement

opened by gagan3012 1
Add `st.cache` to load model

Hi @gagan3012,

Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

Hope this works for you and let me know if you have any other questions! 🎈

Cheers, Johannes

opened by jrieke 1
ValueError: transformers.models.auto.__spec__ is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

opened by varunakk 0
Update README.md
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Update trainer.py
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

opened by skintflickz 0
New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'
I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade !sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

Have attached error screenshot

OS: Windows

Browser Chrome
opened by aishwaryapisal9 2
Update trainer.py
Delete progress_bar_refresh_rate in trainer.py

Description

delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

Motivation and Context

having this argument fails the training process

How Has This Been Tested?

Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

Screenshots (if appropriate):

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by anath2110benten 0
Why is cv2 required?

https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

opened by ChunxuYang 0
Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by RuiFeiHe 6

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

Trainer tool finalized and completed!
Source code(tar.gz)
Source code(zip)
v1.4.1(Jul 2, 2021)

Val acc added
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 2, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v1.3.8(Jul 2, 2021)

New Upload to hf hub module
Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 16, 2021)

Documentation updated along with sematic versioning
Source code(tar.gz)
Source code(zip)

v0.3.1(Jun 15, 2021)

This version features a tested trainer which can be used in 4 lines of code:

from keytotext import KeytotextTrainer

model = KeytotextTrainer()
model.from_pretrained(model_name="t5-small")
model.train(data_df=df,batch_size=4, max_epochs=3, use_gpu=True)
model.save_model()

Source code(tar.gz)
Source code(zip)

v0.2.9(Jun 15, 2021)

This release features the new Trainer module More details coming soon
Source code(tar.gz)
Source code(zip)
v0.2.5(May 12, 2021)
Changes:

Bug Fixes

Maintaining new models

Source code(tar.gz)
Source code(zip)
v0.2.4(May 11, 2021)
Changes:

Refactoring of code

Ability to add new models too

Source code(tar.gz)
Source code(zip)
v0.2.3(May 10, 2021)
v0.2.3 :

Bug fixes

New models added

Source code(tar.gz)
Source code(zip)
v0.2.2(May 10, 2021)
Changes:

Now keytotext supports new models trained by other people too

A new fine-tuning script

Source code(tar.gz)
Source code(zip)
v0.2.1(May 5, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v0.2.0(May 4, 2021)
Latest Release:

Completed API

Completed testing

completed all Evals

UI Improvements too

Source code(tar.gz)
Source code(zip)
v0.1.6(May 2, 2021)
Changes:

Updates to Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.5(May 2, 2021)
Changes:

Added Trainer API

Added Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 30, 2021)

Latest release
Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 27, 2021)

Updates
Source code(tar.gz)
Source code(zip)
0.1.1(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 26, 2021)

Production release- 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Gagan Bhatia

Software Developer | Machine Learning Enthusiast

GitHub Repository https://share.streamlit.io/gagan3012/keytotext/UI/app.py

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 02, 2021

Klexikon: A German Dataset for Joint Summarization and Simplification

Klexikon: A German Dataset for Joint Summarization and Simplification Dennis Aumiller and Michael Gertz Heidelberg University Under submission at LREC

8 Jan 03, 2023

🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

65 Dec 30, 2022

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

21 Dec 29, 2022

A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

10 Jul 17, 2022

Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

166 Jan 07, 2023

Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

3 Nov 22, 2022

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。声明本项目是为方便个人工作所创建的，仅有部分代码原创。

30 Dec 02, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

26 Dec 14, 2022

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

77.3k Jan 03, 2023

Header-only C++ HNSW implementation with python bindings

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: version 0.6 Thanks to (@dyashuni) h

2.3k Jan 05, 2023

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

1 Apr 03, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Related tags

Overview

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

Comments

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

v1.4.1(Jul 2, 2021)

v1.3.9(Jul 2, 2021)

v1.3.8(Jul 2, 2021)

v1.3.1(Jun 16, 2021)

v0.3.1(Jun 15, 2021)

v0.2.9(Jun 15, 2021)

v0.2.5(May 12, 2021)

v0.2.4(May 11, 2021)

v0.2.3(May 10, 2021)

v0.2.2(May 10, 2021)

v0.2.1(May 5, 2021)

v0.2.0(May 4, 2021)

v0.1.6(May 2, 2021)

v0.1.5(May 2, 2021)

v0.1.4(Apr 30, 2021)

v0.1.3(Apr 27, 2021)

0.1.1(Apr 26, 2021)

0.1.0(Apr 26, 2021)

Owner

Gagan Bhatia

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

Klexikon: A German Dataset for Joint Summarization and Simplification

🧪 Cutting-edge experimental spaCy components and features

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

A Flask Sentiment Analysis API, with visual implementation

Knowledge Management for Humans using Machine Learning & Tags

Python library for Serbian Natural language processing (NLP)

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Header-only C++ HNSW implementation with python bindings

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Python library for processing Chinese text

Natural language Understanding Toolkit

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Codename generator using WordNet parts of speech database

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

This repository describes our reproducible framework for assessing self-supervised representation learning from speech