Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Overview

keytotext

pypi Version Downloads Open In Colab Streamlit App API Call Docker Call HuggingFace Documentation Status Code style: black CodeFactor

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

  • Marketing
  • Search Engine Optimization
  • Topic generation etc.
  • Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model: HuggingFace

  • k2t: Model
  • k2t-base: Model
  • mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage: Open In Colab

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

carbon (3)

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here: Open In Colab

from keytotext import trainer

carbon (6)

UI:

UI: Streamlit App

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

image

API:

API: API Call Docker Call

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

k2t_json

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

Articles about keytotext:

Comments
  • ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    Hi,

    I tried to install keytotext via pip install keytotext --upgrade in local machine.

    but came across the following :

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
    ERROR: No matching distribution found for keytotext
    

    My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

    opened by abhijithneilabraham 6
  • Add finetuning model to keytotext

    Add finetuning model to keytotext

    Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus

    enhancement good first issue 
    opened by gagan3012 2
  • "Oh no." ?

    "Error running app. If this keeps happening, please file an issue."

    Ok,...sure? I know nothing about this app.

    Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

    Chrome browser, Linux.

    opened by drscotthawley 2
  • Add Citations

    Add Citations

    Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by gagan3012 1
  • Adding new models to keytotext

    Adding new models to keytotext

    Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    enhancement good first issue 
    opened by gagan3012 1
  • Inference API for Keytotext

    Inference API for Keytotext

    Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

    Describe the solution you'd like Inference API

    enhancement good first issue 
    opened by gagan3012 1
  • Create Better UI

    Create Better UI

    Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

    Describe the solution you'd like Better UI with a nicer design

    enhancement 
    opened by gagan3012 1
  • Add `st.cache` to load model

    Add `st.cache` to load model

    Hi @gagan3012,

    Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

    Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

    Hope this works for you and let me know if you have any other questions! ๐ŸŽˆ

    Cheers, Johannes

    opened by jrieke 1
  • ValueError: transformers.models.auto.__spec__ is None

    ValueError: transformers.models.auto.__spec__ is None

    'from keytotext import pipeline'

    While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

    opened by varunakk 0
  • Update README.md

    Update README.md

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Update trainer.py

    Update trainer.py

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Pipeline error on fresh install

    Pipeline error on fresh install

    Hi I'm getting this on a first run and fresh install

    Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

    opened by skintflickz 0
  • New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.


    TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

    Imported libraries:

    !pip install keytotext --upgrade !sudo apt-get install git-lfs

    from keytotext import trainer

    Training Model:

    model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

    Have attached error screenshot

    • OS: Windows
    • Browser Chrome Error
    opened by aishwaryapisal9 2
  • Update trainer.py

    Update trainer.py

    Delete progress_bar_refresh_rate in trainer.py

    Description

    delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

    Motivation and Context

    having this argument fails the training process

    How Has This Been Tested?

    Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by anath2110benten 0
  • Why is cv2 required?

    Why is cv2 required?

    https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

    I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

    opened by ChunxuYang 0
  • Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by RuiFeiHe 6
Releases(v1.5.0)
Owner
Gagan Bhatia
Software Developer | Machine Learning Enthusiast
Gagan Bhatia
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹. Releases์—์„œ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ฑฐ๋‚˜, tfds-korean์„ ํ†ตํ•ด ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ์„ธ์š”.

Namuwiki corpus ๋ฌธ์žฅ๋‹จ์œ„๋กœ ๋ฏธ๋ฆฌ ๋ถ„์ ˆ๋œ ๋‚˜๋ฌด์œ„ํ‚ค ์ฝ”ํผ์Šค. ๋ชฉ์ ์ด LM๋“ฑ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด๋ผ, ๋งํฌ/์ด๋ฏธ์ง€/ํ…Œ์ด๋ธ” ๋“ฑ๋“ฑ์ด ์ž˜๋ ค์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์žฅ ๋‹จ์œ„ ๋ถ„์ ˆ์€ kss๋ฅผ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ผ์ด์„ ์Šค๋Š” ๋‚˜๋ฌด์œ„ํ‚ค์— ๋ช…์‹œ๋œ ๋ฐ”์™€ ๊ฐ™์ด CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Training-code-of-STM This repository fully reproduces Space-Time Memory Networks Performance on Davis17 val set&Weights backbone training stage traini

haochen wang 128 Dec 11, 2022
SGMC: Spectral Graph Matrix Completion

SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format

Chao Chen 8 Dec 12, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha

Greg Ver Steeg 592 Dec 18, 2022
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

transformers-arithmetic This repository contains the code to reproduce the experiments from the paper: Nogueira, Jiang, Lin "Investigating the Limitat

Castorini 33 Nov 16, 2022
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
Generate vector graphics from a textual caption

VectorAscent: Generate vector graphics from a textual description Example "a painting of an evergreen tree" python text_to_painting.py --prompt "a pai

Ajay Jain 97 Dec 15, 2022
NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles NewsMTSC is a dataset for target-dependent sentiment classification (TSC)

Felix Hamborg 79 Dec 30, 2022
๐Ÿ•น An esoteric language designed so that the program looks like the transcript of a Pokรฉmon battle

PokรฉBattle is an esoteric language designed so that the program looks like the transcript of a Pokรฉmon battle. Original inspiration and specification

Eduardo Correia 9 Jan 11, 2022
Train and use generative text models in a few lines of code.

blather Train and use generative text models in a few lines of code. To see blather in action check out the colab notebook! Installation Use the packa

Dan Carroll 16 Nov 07, 2022
The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation .

Qian Wang 21 Dec 17, 2022
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking ็”จไฝœNLPIRๅฎž้ชŒๅฎค, Pre-training

ZYMa 12 Apr 07, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022