Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Overview

keytotext

pypi Version Downloads Open In Colab Streamlit App API Call Docker Call HuggingFace Documentation Status Code style: black CodeFactor

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

  • Marketing
  • Search Engine Optimization
  • Topic generation etc.
  • Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model: HuggingFace

  • k2t: Model
  • k2t-base: Model
  • mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage: Open In Colab

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

carbon (3)

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here: Open In Colab

from keytotext import trainer

carbon (6)

UI:

UI: Streamlit App

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

image

API:

API: API Call Docker Call

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

k2t_json

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

Articles about keytotext:

Comments
  • ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    Hi,

    I tried to install keytotext via pip install keytotext --upgrade in local machine.

    but came across the following :

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
    ERROR: No matching distribution found for keytotext
    

    My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

    opened by abhijithneilabraham 6
  • Add finetuning model to keytotext

    Add finetuning model to keytotext

    Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus

    enhancement good first issue 
    opened by gagan3012 2
  • "Oh no." ?

    "Error running app. If this keeps happening, please file an issue."

    Ok,...sure? I know nothing about this app.

    Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

    Chrome browser, Linux.

    opened by drscotthawley 2
  • Add Citations

    Add Citations

    Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by gagan3012 1
  • Adding new models to keytotext

    Adding new models to keytotext

    Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    enhancement good first issue 
    opened by gagan3012 1
  • Inference API for Keytotext

    Inference API for Keytotext

    Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

    Describe the solution you'd like Inference API

    enhancement good first issue 
    opened by gagan3012 1
  • Create Better UI

    Create Better UI

    Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

    Describe the solution you'd like Better UI with a nicer design

    enhancement 
    opened by gagan3012 1
  • Add `st.cache` to load model

    Add `st.cache` to load model

    Hi @gagan3012,

    Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

    Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

    Hope this works for you and let me know if you have any other questions! ๐ŸŽˆ

    Cheers, Johannes

    opened by jrieke 1
  • ValueError: transformers.models.auto.__spec__ is None

    ValueError: transformers.models.auto.__spec__ is None

    'from keytotext import pipeline'

    While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

    opened by varunakk 0
  • Update README.md

    Update README.md

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Update trainer.py

    Update trainer.py

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Pipeline error on fresh install

    Pipeline error on fresh install

    Hi I'm getting this on a first run and fresh install

    Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

    opened by skintflickz 0
  • New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.


    TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

    Imported libraries:

    !pip install keytotext --upgrade !sudo apt-get install git-lfs

    from keytotext import trainer

    Training Model:

    model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

    Have attached error screenshot

    • OS: Windows
    • Browser Chrome Error
    opened by aishwaryapisal9 2
  • Update trainer.py

    Update trainer.py

    Delete progress_bar_refresh_rate in trainer.py

    Description

    delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

    Motivation and Context

    having this argument fails the training process

    How Has This Been Tested?

    Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by anath2110benten 0
  • Why is cv2 required?

    Why is cv2 required?

    https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

    I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

    opened by ChunxuYang 0
  • Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by RuiFeiHe 6
Releases(v1.5.0)
Owner
Gagan Bhatia
Software Developer | Machine Learning Enthusiast
Gagan Bhatia
๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹. Releases์—์„œ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ฑฐ๋‚˜, tfds-korean์„ ํ†ตํ•ด ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ์„ธ์š”.

Namuwiki corpus ๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ฏธ๋ฆฌ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ์ฝ”ํผ์Šค. ๋ชฉ์ ์ด LM๋“ฑ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด๋ผ, ๋งํฌ/์ด๋ฏธ์ง€/ํ…Œ์ด๋ธ” ๋“ฑ๋“ฑ์ด ์ž˜๋ ค์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์žฅ ๋‹จ์œ„ ๋ถ„์ ˆ์€ kss๋ฅผ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ผ์ด์„ ์Šค๋Š” ๋‚˜๋ฌด์œ„ํ‚ค์— ๋ช…์‹œ๋œ ๋ฐ”์™€ ๊ฐ™์ด CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
Yuqing Xie 2 Feb 17, 2022
HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

Vaibhav Srivastav 110 Jan 01, 2023
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

Tao Lei 14 Dec 12, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 02, 2023
**NSFW** A chatbot based on GPT2-chitchat

DangBot -- ๅฅฝๆ€ชๅ“ฆ๏ผŒๅ†ๆฅไธ€ๅฅ ๅก็พคๆ€ช่ฏbot๏ผŒpowered by GPT2 for Chinese chitchat Training Example: python train.py --lr 5e-2 --epochs 30 --max_len 300 --batch_size 8

Tommy Yang 11 Jul 21, 2022
Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Calibre Recipe fรผr "Analyse & Kritik" Dies ist ein "Recipe" fรผr die Konvertierung der aktuellen Ausgabe der Zeitung Analyse & Kritik in ein Ebook. Es

Henning 3 Jan 04, 2022
Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Twitter-Sentiment-Analysis The hands-on project is in Python 3 Programming class offered by University of Michigan via Coursera. The task is to build

Eszter Pai 1 Jan 03, 2022
Weird Sort-and-Compress Thing

Weird Sort-and-Compress Thing A weird integer sorting + compression algorithm inspired by a conversation with Luthingx (it probably already exists by

Douglas 1 Jan 03, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank li

Tae-Hwan Jung 11.9k Jan 08, 2023
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen ๐Ÿ Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 โ€“ 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

WordDumb A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. Languages X-Ray supp

172 Dec 29, 2022