Topic Inference with Zeroshot models

Overview

zeroshot_topics

Table of Contents

Installation

zeroshot_topics is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+ and PyPy.

$ pip install zeroshot_topics

Usage

from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text)

License

zeroshot_topics is distributed under the terms of

You might also like...
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Spert NLP Relation Extraction API deployed with torchserve for inference

SpERT torchserve Spert_torchserve is the Relation Extraction model (SpERT)Span-based Entity and Relation Transformer API deployed with pytorch/serve.

A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Comments
  • Error when I run the sample code

    Error when I run the sample code

    I get this when I try to run the sample code:

    Traceback (most recent call last): File "zerotopics.py", line 1, in from zeroshot_topics import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/init.py", line 3, in from .zeroshot_tm import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/zeroshot_tm.py", line 3, in from .utils import load_zeroshot_model File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/utils.py", line 6, in def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"): File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/functools.py", line 490, in lru_cache raise TypeError('Expected maxsize to be an integer or None') TypeError: Expected maxsize to be an integer or None

    Specifics: Python version 3.7.9

    pip freeze gives (yeh this virtualenv is getting big :):

    absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 aniso8601==9.0.1 antlr4-python3-runtime==4.8 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work async-timeout==4.0.2 asynctest==0.13.0 attrs==20.3.0 Babel==2.9.1 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work bertopic==0.6.0 blis @ file:///opt/concourse/worker/volumes/live/cd6a6bea-d063-4b62-4c10-fcc89b17d0ac/volume/cython-blis_1594246851083/work boto3==1.17.86 botocore==1.20.86 brotlipy==0.7.0 cachetools==4.2.1 catalogue==2.0.6 certifi==2020.12.5 cffi @ file:///opt/concourse/worker/volumes/live/2aa8abfe-8b8d-4889-78d9-837b74c3cd64/volume/cffi_1606255119410/work chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work charset-normalizer==2.0.9 click==7.1.2 colorama==0.4.4 coloredlogs==15.0.1 commonmark==0.9.1 cryptography @ file:///opt/concourse/worker/volumes/live/41c3d62a-f1f8-46ce-414a-9adaf4ea7d96/volume/cryptography_1607636752064/work cycler==0.10.0 cymem @ file:///opt/concourse/worker/volumes/live/3e8d7428-f57d-4000-44e7-34ac8a744f13/volume/cymem_1605062299053/work Cython==0.29.23 dataclasses==0.6 datasets==1.17.0 decorator @ file:///home/ktietz/src/ci/decorator_1611930055503/work dill==0.3.4 docformatter==1.4 docutils==0.15.2 emoji==1.6.1 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.2.0/en_core_web_md-3.2.0-py3-none-any.whl en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl en-core-web-trf @ https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl et-xmlfile==1.1.0 fairscale==0.4.4 Faker==8.16.0 fasttext @ file:///Users/scharlesworth/fastText-0.9.2 filelock==3.0.12 flake8==4.0.1 flake8-bugbear==21.11.29 Flask==2.0.2 Flask-Cors==3.0.10 Flask-RESTful==0.3.9 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 gitdb2==4.0.2 GitPython==3.1.24 google-api-core==1.26.2 google-api-python-client==2.0.2 google-auth==1.28.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.53.0 grpcio==1.43.0 hdbscan==0.8.27 httplib2==0.19.0 huggingface-hub==0.2.1 humanfriendly==10.0 hydra-core==1.1.1 idna @ file:///tmp/build/80754af9/idna_1593446292537/work imagesize==1.3.0 importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work importlib-resources==5.4.0 iniconfig==1.1.1 iopath==0.1.9 ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/ac685347-76d6-4904-4b88-886c6a434f22/volume/ipython_1614616430264/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work itsdangerous==2.0.1 jedi @ file:///opt/concourse/worker/volumes/live/5006b7b5-a924-4788-6cfe-ae05d8be8830/volume/jedi_1606932947370/work Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonlines==3.0.0 jsonschema==3.0.2 jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work keybert==0.5.0 kiwisolver==1.3.1 langcodes==3.3.0 llvmlite==0.36.0 loguru==0.5.3 Markdown==3.3.4 markdown-it-py==0.5.8 MarkupSafe==2.0.1 matplotlib==3.4.0 mccabe==0.6.1 mkl-fft==1.2.0 mkl-random==1.1.1 mkl-service==2.3.0 mock==4.0.3 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash @ file:///opt/concourse/worker/volumes/live/9a0582f9-9097-4dab-6d7a-fcf62b4968ae/volume/murmurhash_1607456116622/work myst-parser==0.12.10 nltk==3.6.5 numba==0.53.1 numpy==1.20.2 oauthlib==3.1.1 omegaconf==2.1.1 openai==0.6.3 openpyxl==3.0.9 packaging==20.9 pandas==1.2.1 parlai==1.5.1 parquet==1.3.1 parso==0.7.0 pathy==0.6.1 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==8.2.0 plac @ file:///opt/concourse/worker/volumes/live/a94b6881-2d18-4055-5a3c-f24036f05ef6/volume/plac_1594259982880/work pluggy==1.0.0 ply==3.11 portalocker==2.3.2 praw==7.1.0 prawcore==1.5.0 preshed @ file:///opt/concourse/worker/volumes/live/952fa955-acc7-4aa0-6766-86f802ea8ef1/volume/preshed_1608233410312/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.6 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py==1.11.0 py-gfm==1.0.2 py-rouge==1.1 py4j==0.10.7 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycodestyle==2.8.0 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pyee==8.2.2 pyflakes==2.4.0 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work PyJWT==2.3.0 pynndescent==0.5.2 pyodbc==4.0.32 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing==2.4.7 pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work pytest==6.2.5 pytest-datadir==1.3.1 pytest-regressions==2.2.0 python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-transformers==1.2.0 pytz==2020.5 PyYAML==6.0 pyzmq==20.0.0 regex==2021.11.10 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-mock==1.9.3 requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 rich==10.16.2 rsa==4.7.2 s3transfer==0.4.2 sacremoses==0.0.44 scikit-learn==0.24.1 scipy==1.6.2 seaborn==0.11.1 sentence-transformers==1.0.4 sentencepiece==0.1.91 seqeval==0.0.5 sh==1.14.2 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 slack-bolt==1.11.1 slack-sdk==3.13.0 slackclient==2.9.3 slackeventsapi==3.0.1 smart-open==5.2.1 smmap==5.0.0 snowballstemmer==2.2.0 spacy==3.2.0 spacy-alignments==0.8.4 spacy-legacy==3.0.8 spacy-loggers==1.0.1 spacy-sentence-bert==0.1.2 spacy-transformers==1.1.2 spark-nlp==3.0.2 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.0.0 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.2 subword-nmt==0.3.8 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.4.1 text-unidecode==1.3 thinc==8.0.13 threadpoolctl==2.1.0 thriftpy2==0.4.14 tokenizers==0.10.2 toml==0.10.2 torch==1.10.1 torchtext==0.11.1 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm==4.62.3 traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==4.11.0 typer==0.4.0 typing-extensions==3.7.4.3 umap-learn==0.5.1 Unidecode==1.3.2 untokenize==0.1.1 update-checker==0.18.0 uritemplate==3.0.1 urllib3==1.26.7 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webexteamsbot==0.1.4.2 webexteamssdk==1.6 websocket-client==0.57.0 websocket-server==0.6.4 Werkzeug==2.0.1 xlrd==2.0.1 xxhash==2.0.2 yarl==1.7.2 zeroshot-topics==0.1.0 zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

    opened by sdcharle 1
  • Add size to lru_cache

    Add size to lru_cache

    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/__init__.py in <module>()
          1 __version__ = '0.1.0'
          2 
    ----> 3 from .zeroshot_tm import ZeroShotTopicFinder
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/zeroshot_tm.py in <module>()
          1 import attr
          2 from keybert import KeyBERT
    ----> 3 from .utils import load_zeroshot_model
          4 from nltk.corpus import wordnet as wn
          5 
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/utils.py in <module>()
          4 
          5 @lru_cache
    ----> 6 def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"):
          7     classifier = pipeline("zero-shot-classification", model=model_name)
          8     return classifier
    
    /usr/lib/python3.7/functools.py in lru_cache(maxsize, typed)
        488             maxsize = 0
        489     elif maxsize is not None:
    --> 490         raise TypeError('Expected maxsize to be an integer or None')
        491 
        492     def decorating_function(user_function):
    
    TypeError: Expected maxsize to be an integer or None
    

    I assume that you have to provide, maxsize parameter to lru_cache. Worked for me, when I provided the parameter.

    opened by gsasikiran 6
Releases(v.0.0.1)
Owner
Rita Anjana
ML engineer
Rita Anjana
Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

Sanchit Gandhi 21 Dec 14, 2022
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022
CorNet Correlation Networks for Extreme Multi-label Text Classification

CorNet Correlation Networks for Extreme Multi-label Text Classification Prerequisites python==3.6.3 pytorch==1.2.0 torchgpipe==0.0.5 click==7.0 ruamel

Guangxu Xun 38 Dec 31, 2022
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

Shimin Zhang 87 Dec 19, 2022
Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algori

Snowball Stemming language and algorithms 613 Jan 07, 2023
A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

James Betker 2.1k Jan 01, 2023
Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

A Infomation Grathering tool that reverse search phone numbers and get their details ! What is phomber? Phomber is one of the best tools available fo

S41R4J 121 Dec 27, 2022
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023
Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Shreya Shankar 190 Dec 21, 2022
Pytorch NLP library based on FastAI

Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library It follows the same api as fastai and extends it allowing for quick

Agis pof 283 Nov 21, 2022
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。

Kuang Dada 6 Nov 08, 2022
Tools, wrappers, etc... for data science with a concentration on text processing

Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess

207 Nov 22, 2022
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 13 Jan 06, 2023
Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 04, 2022
Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

European Nopsled 4 Nov 24, 2021
IMDB film review sentiment classification based on BERT's supervised learning model.

IMDB film review sentiment classification based on BERT's supervised learning model. On the other hand, the model can be extended to other natural language multi-classification tasks.

Paris 1 Apr 17, 2022
Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

Eric 20 Nov 18, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 02, 2023
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP prod

VinAI Research 109 Dec 02, 2022