An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Overview

Welcome to AdaptNLP

A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end to end tasks.

CI PyPI

What is AdaptNLP?

AdaptNLP is a python package that allows users ranging from beginner python coders to experienced Machine Learning Engineers to leverage state-of-the-art Natural Language Processing (NLP) models and training techniques in one easy-to-use python package.

Utilizing fastai with HuggingFace's Transformers library and Humboldt University of Berlin's Flair library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks simplifying what it takes to train, perform inference, and deploy NLP-based models and microservices.

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Despite quick inference functionalities such as the pipeline API in transformers, it still is not quite as flexible nor fast enough. With AdaptNLP's Easy* inference modules, these tend to be slightly faster than the pipeline interface (bare minimum the same speed), while also providing the user with simple intuitive returns to alleviate any unneeded junk that may be returned.

Along with this, with the integration of the fastai library the code needed to train or run inference on your models has a completely modular API through the fastai Callback system. Rather than needing to write your entire torch loop, if there is anything special needed for a model a Callback can be written in less than 10 lines of code to achieve your specific functionalities.

Finally, when training your model fastai is on the forefront of beign a library constantly bringing in the best practices for achiving state-of-the-art training with new research methodologies heavily tested before integration. As such, AdaptNLP fully supports training with the One-Cycle policy, and using new optimizer combinations such as the Ranger optimizer with Cosine Annealing training through simple one-line fitting functions (fit_one_cycle and fit_flat_cos).

Installation Directions

PyPi

To install with pypi, please use:

pip install adaptnlp

Or if you have pip3:

pip3 install adaptnlp

Conda (Coming Soon)

Developmental Builds

To install any developmental style builds, please follow the below directions to install directly from git:

Stable Master Branch The master branch generally is not updated much except for hotfixes and new releases. To install please use:

pip install git+https://github.com/Novetta/adaptnlp

Developmental Branch {% include note.html content='Generally this branch can become unstable, and it is only recommended for contributors or those that really want to test out new technology. Please make sure to see if the latest tests are passing (A green checkmark on the commit message) before trying this branch out' %} You can install the developmental builds with:

pip install git+https://github.com/Novetta/[email protected]

Docker Images

There are actively updated Docker images hosted on Novetta's DockerHub

The guide to each tag is as follows:

  • latest: This is the latest pypi release and installs a complete package that is CUDA capable
  • dev: These are occasionally built developmental builds at certain stages. They are built by the dev branch and are generally stable
  • *api: The API builds are for the REST-API

To pull and run any AdaptNLP image immediatly you can run:

docker run -itp 8888:8888 novetta/adaptnlp:TAG

Replacing TAG with any of the afformentioned tags earlier.

Afterwards check localhost:8888 or localhost:888/lab to access the notebook containers

Navigating the Documentation

The AdaptNLP library is built with nbdev, so any documentation page you find (including this one!) can be directly run as a Jupyter Notebook. Each page at the top includes an "Open in Colab" button as well that will open the notebook in Google Colaboratory to allow for immediate access to the code.

The documentation is split into six sections, each with a specific purpose:

Getting Started

This group contains quick access to the homepage, what are the AdaptNLP Cookbooks, and how to contribute

Models and Model Hubs

These contain any relevant documentation for the AdaptiveModel class, the HuggingFace Hub model search integration, and the Result class that various inference API's return

Class API

This section contains the module documentation for the inference framework, the tuning framework, as well as the utilities and foundations for the AdaptNLP library.

Inference and Training Cookbooks

These two sections provide quick access to single use recipies for starting any AdaptNLP project for a particular task, with easy to use code designed for that specific use case. There are currently over 13 different tutorials available, with more coming soon.

NLP Services with FastAPI

This section provides directions on how to use the AdaptNLP REST API for deploying your models quickly with FastAPI

Contributing

There is a controbution guide available here

Testing

AdaptNLP is run on the nbdev framework. To run all tests please do the following:

  1. pip install nbverbose
  2. git clone https://github.com/Novetta/adaptnlp
  3. cd adaptnlp
  4. pip install -e .
  5. nbdev_test_nbs

This will run every notebook and ensure that all tests have passed. Please see the nbdev documentation for more information about it.

Contact

Please contact Zachary Mueller at [email protected] with questions or comments regarding AdaptNLP.

Follow us on Twitter at @TheZachMueller and @AdaptNLP for updates and NLP dialogue.

License

This project is licensed under the terms of the Apache 2.0 license.

Comments
  • multi-label classification / paperswithcode dataset

    multi-label classification / paperswithcode dataset

    Hi guys,

    Hope you are all well !

    I was wondering if adaptnlp can handle multi-label classification with 1560 labels.

    More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

    Refs:

    Thanks for any insights or inputs on that.

    Cheers, X

    opened by ghost 7
  • cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    Describe the bug Your demo Colab Notebook "Custom Fine-Tuning and Training with Transformer Models" doesn't work and generates the following error: image

    To Reproduce Steps to reproduce the behavior:

    1. Go to '...'
    2. Click on '....'
    3. Scroll down to '....'
    4. See error

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. iOS]
    • Browser [e.g. chrome, safari]
    • Version [e.g. 22]

    Smartphone (please complete the following information):

    • Device: [e.g. iPhone6]
    • OS: [e.g. iOS8.1]
    • Browser [e.g. stock browser, safari]
    • Version [e.g. 22]

    Additional context Add any other context about the problem here.

    bug 
    opened by lematmat 5
  • Significant slowdown in EasyTokenTagger release 0.2.0

    Significant slowdown in EasyTokenTagger release 0.2.0

    I'm experiencing a slowdown in NER performance using EasyTokenTagger and 'ner-ontonotes' after updating to release 0.20. Has there been any underlying changes to how the tagger object works?

    Specifically, I am dealing with a very large chunk of text. Prior to this release, the NER tagging took around 15 seconds for this particular text. Now, it's taking 15+ minutes the first time but subsequent calls on that text are very quick. Is there some sort of caching or indexing that's being done now? I'd imagine this could create a lot of overhead for large chunks of text.

    opened by mkongsiri-Novetta 5
  • Can't load big dataset

    Can't load big dataset

    Describe the bug It happens when I want to learning_rate = finetuner.find_learning_rate(**learning_rate_finder_configs) in the tutorial. I have a big dataset with 200k rows and each of them has a text with around 200 words.

    In your code when you instantiate the TextDataset, the line tokenized_text = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)) takes an eternity for a text of 20 million words. Do you think it can be achieved in the better/faster way like by keeping the rows like they are ?

    For the record: Time for 100 characters: 0.0003399848937988281s Time for 1000 characters: 0.00124359130859375s Time for 10 000 characters: 0.012135982513427734s Time for 100 000 characters: 0.2131056785583496s Time for 1 000 000 characters: 8.782422542572021s Time for 10 000 000 characters: 734.5397665500641s

    Can't reach the end of the full TextDataset (109 610 928 characters).

    To Reproduce Tutorial with a big dataset

    opened by NicoLivesey 5
  • AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    Describe the bug Trying to freeze a LMFinetuner based on Camembert weights and get:


    AttributeError Traceback (most recent call last) in 6 } 7 finetuner = LMFineTuner(**ft_configs) ----> 8 finetuner.freeze()

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/adaptnlp/transformers/finetuning.py in freeze(self) 1630 """Freeze last classification layer group only 1631 """ -> 1632 layers_len = len(list(self.model.cls.parameters())) 1633 self.freeze_to(-layers_len) 1634

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/torch/nn/modules/module.py in getattr(self, name) 573 if name in modules: 574 return modules[name] --> 575 raise AttributeError("'{}' object has no attribute '{}'".format( 576 type(self).name, name)) 577

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    To Reproduce

    from adaptnlp import LMFineTuner
    train_file = "path/to/train" 
    valid_file = "path/to/valid"
    ft_configs = {
                  "train_data_file": train_file,
                  "eval_data_file": valid_file,
                  "model_type": "camembert",
                  "model_name_or_path": "camembert-base",
                 }
    finetuner = LMFineTuner(**ft_configs)
    finetuner.freeze()
    

    Expected behavior No error

    Desktop (please complete the following information):

    • OS: Amazon Linux
    • Browser Chrome
    opened by NicoLivesey 4
  • AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    Cannot use pool option to generate embeddings (instead of the default rnn).

    A snippet for the problem:

    embedding_type='albert-xxlarge-v2'
    embedding_methods=["pool"]
    doc_embeddings = EasyDocumentEmbeddings(embedding_type, methods = embedding_methods)
    
    

    This is the error I get:

      File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 91, in __init__
       self._initial_setup(self.label_dict, **kwargs)
     File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 97, in _initial_setup
       document_embeddings: DocumentRNNEmbeddings = self.encoder.rnn_embeddings
    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'
    

    Expected behavior would be to successfully obtain an easy document embeddings object with no errors

    Running on debian buster, python3.7

    If someone could give me a fix or a workaround or if I'm using this incorrectly, then please let me know

    opened by blerstpub 3
  • EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    Hi! I tried to follow the tutorial for training custom sequence classifier: https://novetta.github.io/adaptnlp/tutorial/training-sequence-classification.html The last step returns empty sentences while expected labels: sentences = classifier.tag_text(example_text, model_name_or_path=OUTPUT_DIR)

    To Reproduce the behavior:

    from adaptnlp import EasySequenceClassifier
    from flair.data import Sentence
    
    OUTPUT_DIR = "…/best-model.pt"    # my custom model
    classifier = EasySequenceClassifier()
    
    ex_text = "This is a good text example"
    example_text=[Sentence(ex_text)]
    
    sentences = classifier.tag_text(text=example_text, model_name_or_path=OUTPUT_DIR, mini_batch_size=1)
    print("Label output:\n")
    print(sentences)
    

    Returns

    2020-12-28 17:44:31,111 loading file .../best-model.pt
    Label output:
    
    None
    

    Surprisingly labels got added to example_text print(example_text) Returns [Sentence: " This is a good text example " [− Tokens: 17 − Sentence-Labels: {'label': [0 (0.8812)]}]]

    Proposed explanation/ contribution: I think I know the reason for unexpected behavior and will be happy to help. classifier.tag_text creates FlairSequenceClassifier classifier. FlairSequenceClassifier initiates flair.models.TextClassifier classifier and uses TextClassifier predict method within its own predict method. But flair.models.TextClassifier predict method returns None because the labels are directly added to the sentences. I can re-write FlairSequenceClassifier predict method to return Sentences with labels instead of None.

    opened by DinaraN 3
  • Sequence classification using REST API fails with models except en-sentiment

    Sequence classification using REST API fails with models except en-sentiment

    Sequence classification over REST API using any model except for en-sentiment fails with:

    File "/usr/local/lib/python3.6/dist-packages/starlette/routing.py", line 41, in app response = await func(request) File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 197, in app dependant=dependant, values=values, is_coroutine=is_coroutine File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 147, in run_endpoint_function return await dependant.call(**values) File "./app/main.py", line 87, in sequence_classifier text=text, mini_batch_size=1, model_name_or_path=_SEQUENCE_CLASSIFICATION_MODEL File "/adaptnlp/adaptnlp/sequence_classification.py", line 285, in tag_text return classifier.predict(text=text, mini_batch_size=mini_batch_size, **kwargs,) File "/adaptnlp/adaptnlp/sequence_classification.py", line 140, in predict text_sent.add_label(label) TypeError: add_label() missing 1 required positional argument: 'value'

    Reproducable with: docker run -itp 5000:5000 -e TOKEN_TAGGING_MODE='ner' \ -e TOKEN_TAGGING_MODEL='ner-ontonotes-fast' \ -e SEQUENCE_CLASSIFICATION_MODEL='nlptown/bert-base-multilingual-uncased-sentiment' \ achangnovetta/adaptnlp-rest:latest \ bash

    opened by VogtAI 3
  • AdaptNLP v0.2.x Additional Features Discussion

    AdaptNLP v0.2.x Additional Features Discussion

    There are a lot of ideas that may be floating for feature implementations, so this thread just provides a mini roadmap and environment to think about adaptnlp's progression.

    Ideas can be stated freely in this thread and do not replace feature-request issue posts.

    • [x] Tokenizer Start integrating tokenizers all across adaptnlp for speed and performance enhancements for training and inference.
    • [x] Summarization Add NLP-task of summarization using document-level encoder based on transformer language models
    • [x] GPU Multi-GPU and mixed-precision is prevalent in AdaptNLP, but its implementation can be improved and debugged ~~FastAPI Batch-Serving Improve on the concurrent calls with batch processing from the NLP models (maybe try to make it CPU and GPU agnostic for ease-of-use)~~ ~~Model Downloading Start structuring a way to download and potentially upload pre-trained NLP-task models~~
    enhancement 
    opened by aychang95 3
  • Data API

    Data API

    We probably should have a data API of some form, that ties into https://github.com/Novetta/adaptnlp/issues/128

    Ideally it should simply prep a dataset for tokenization of a model, or tokenize the data itself.

    For now we cover two inputs:

    1. Individual texts
    2. CSV

    We should support something akin to fastai's get_y, but with decent defaults so that customization is available, but not needed.

    Ideally something like:

    dset = TaskDataset.from_df(
      df,  # Can be fname or dataframe
      get_x = ColReader('text'),
      get_y = ColReader('label'),
      splitter = RandomSplitter(),
      model = 'bert-base-uncased', # The name/type of downstream model
      task = "ner" # Or use a `Task.NER` namespace class
    )
    

    And further:

    dset.dataloaders(bs=8, collate_fn=data_collator)
    

    It reads extremely similar to the fastai API, but we do not use the fastai API, as for text doing it like this is a bit easier.

    The highest level API would look like so:

    dls = TaskDataLoaders.from_df(df, 'text', 'label', model='bert-base-uncased')
    

    We should note the model used, and when integrating it with the tuning API if something is off with the model entered, we make note of that

    enhancement 
    opened by muellerzr 2
  • ImportError: cannot import name 'EasyTokenTagger'

    ImportError: cannot import name 'EasyTokenTagger'

    Describe the bug A clear and concise description of what the bug is. I tried to run the code in the tutorial

    from adaptnlp import EasyTokenTagger
    
    
    ## Example Text
    example_text = "Novetta's headquarters is located in Mclean, Virginia."
    
    ## Load the token tagger module and tag text with the NER model 
    tagger = EasyTokenTagger()
    sentences = tagger.tag_text(text=example_text, model_name_or_path="ner")
    
    ## Output tagged token span results in Flair's Sentence object model
    for sentence in sentences:
        for entity in sentence.get_spans("ner"):
            print(entity)
    

    and it gave me the error:

    ...
      File "/home/rajiv/Documents/dev/python/nltk-trial/adaptnlp.py", line 2, in <module>
        from adaptnlp import EasyTokenTagger
    ImportError: cannot import name 'EasyTokenTagger'
    

    Desktop (please complete the following information):

    • OS: Ubuntu
    • Version: 20.04
    • Python: 3.6.9
    opened by RAbraham 2
  • classifier.tag_text on GPU!

    classifier.tag_text on GPU!

    hi i want to classify texts: classifier = EasySequenceClassifier() hub = HFModelHub() hub.search_model_by_task('text-classification') model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; sentence = classifier.tag_text(text=inputs, model_name_or_path=model, mini_batch_size=1)

    Q1: how force to run it on CPU? Q2: now i have GPU but i can't success to run, my errors: ... FileNotFoundError: [Errno 2] No such file or directory: 'nlptown/bert-base-multilingual-uncased-sentiment' During handling of the above exception, another exception occurred: ... RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by topliftarm 0
  • Unified Training API

    Unified Training API

    Training API will use fastai under the hood, and we'll make a few functions to build general datasets.

    Tasks and sample datasets to use:

    Other Information

    Task API's should have a simple user interface, IE high-level can only input specific options, while midlevel has access to the full fastai Learner params.

    Example mid-level API I'm thinking about:

    dls = some_build_data_thing()
    tuner = QAFineTuner(dls, 'bert-base-cased')
    tuner.tune(
      scheduler = 'fit_flat_cos',
      n_epochs = 3,
      lr = None,
      suggest_method = 'valley', # Triggers if lr is None
      additional_callbacks = []
    )
    

    And its high-level:

    tuner = QAFineTuner.from_csv(
      question_column_name = "question",
      answer_column_name = "answer",
      model = "bert-base-cased"
    )
    tuner.tune(...)
    

    We should automatically pull in proper metrics for each task, but users have the option to bring in their own as well and pass it to QAFineTuner (good defaults)

    Tuners should also have a func like QAFineTuner.from_csv() to build the dataset in-house

    enhancement 
    opened by muellerzr 2
  • Save context in QuestionAnswering and re-use it

    Save context in QuestionAnswering and re-use it

    I notices when we run any code snippet, it convert the text to vectors or some similar thing. For example in this code snippet

    from adaptnlp import EasyQuestionAnswering 
    from pprint import pprint
    
    ## Example Query and Context 
    query = "What is the meaning of life?"
    context = "Machine Learning is the meaning of life."
    top_n = 5
    
    ## Load the QA module and run inference on results 
    qa = EasyQuestionAnswering()
    best_answer, best_n_answers = qa.predict_qa(query=query, context=context, n_best_size=top_n, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
    
    ## Output top answer as well as top 5 answers
    print(best_answer)
    pprint(best_n_answers)
    

    It convert both query and context to vectors first. What if we have very long context and we have a lot of queries, each time it will convert the context to vector. I think there should be a way to save context vector and re-use it instead of creating again and again.

    enhancement 
    opened by talhaanwarch 1
  • Stretch Goals

    Stretch Goals

    • [x] HuggingFace raw embeddings over Flair
    • [x] Try and integrate Callbacks for text generation and other classes that aren't using it

      Note: Didn't do this for text generation, more complex than its worth

    • [x] Use fastrelease (with conda)
    • [x] Improve test coverage
    • [x] GH CI for testing Mac, Windows, and Linux, similar to how fastai has it setup
    • [x] nbdev?
    • [x] Windows support
    • [x] Use Pipeline for inference

      Note: Pipeline is slower on many tasks that AdaptNLP covers, tests are in place to ensure that this is always true

    • [ ] 1.0.0: Unified training framework for at least 4 NLP tasks
    enhancement 
    opened by muellerzr 0
Releases(v0.3.7)
  • v0.3.7(Nov 10, 2021)

  • v0.3.6(Nov 9, 2021)

  • v0.3.3(Sep 3, 2021)

    Bug Squashed

    • Embeddings were conjoined rather than separated out by word
    • Question Answering Results would only return the first instance, rather than top n instances
    • AdaptiveTuner can accept a label_names parameter for where the labels in a batch are present
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 11, 2021)

    New Features

    • A new Data API that integrates with HuggingFace's Dataset class

    • A new Tuner API for training and fine-tuning Transformer models

    • Full integration of the latest fastai library for full access to state-of-the-art practices when training and fine-tuning a model. As improvements are made to the library AdaptNLP will update to accomodate them

    • A new Result API that most inference modules return. This is a filterable result ensuring that you only get the most relevent information when returning a prediction from the Easy* modules

    Breaking Changes

    • The train and eval capabilities in the Easy* modules no longer exist, and all training related functionalities have migrated to the Tuner API
    • LanguageModelFineTuner no longer exists, and the same tuning functionality is in LanguageModelTuner

    Bugs Squashed

    • max_len Attribute Error (127
    • Integrate a complete Data API (milestone) (129
    • Use the latest fastcore (132)
    • Fix unused kwarg arguments in text generation (134)
    • Fix name 'df' is not defined (135)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(May 5, 2021)

    Breaking Changes:

    • New versions of AdaptNLP will require a minimum torch version of 1.7, and flair of 0.9 (currently we install via git until 0.9/0.81 is released)

    New Features

    Bugs Squashed

    • Fix accessing bart-large-cnn (110)
    • Fix SAVE_STATE_WARNING (114)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jan 11, 2021)

    Official AdaptNLP Docker Images updated

    • Using NVIDIA NGC Container Registry Cuda base images #101
    • All images should be deployable via. Kubeflow Jupyter Servers
    • Cleaner python virtualvenv setup #101
    • Official readme can be found at https://github.com/Novetta/adaptnlp/blob/master/docker/README.md

    Minor Bug Fixes

    • Fix token tagging REST application type check #92
    • Semantic fixes in readme #94
    • Standalone microservice REST application images #93
    • Python 3.7+ is now an official requirement #97
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Sep 17, 2020)

    Updated to nlp 0.4 -> datasets 1.0+ and multi-label training for sequence classification fixes.

    EasySequenceClassifier.train() Updates

    • Integrates datasets.Dataset now
    • Swapped order of formatting and label column renaming due to labels not showing up from torch data batches #87

    Tutorials and Documentation

    • Documentation and sequence classification tutorials have been updated to address nlp->datasets name change
    • Broken links also updated

    ODSC Europe Workshop 2020: Notebooks and Colab

    • ODSC Europe 2020 workshop materials now available in repository "/tutorials/Workshop"
    • Easy to run notebooks and colab links aligned with the tutorials are available
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 1, 2020)

    Updated to transformers 3+, nlp 0.4+, flair 0.6+, pandas 1+

    New Features!

    New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

    • Integrates nlp.Dataset and transformers.Trainer for a streamlined training workflow
    • Tutorials, notebooks, and colab links available
    • Sequence Classification task has been implemented, other NLP tasks are in the works
    • SequenceClassifierTrainer is still available, but will be transitioned into the EasySequenceClassifier and deprecated

    New and "easier" LMFineTuner

    • Integrates transformers.Trainer for a streamlined training workflow
    • Older LMFineTuner is still available as LMFineTunerManual, but will be deprecated in later releases
    • Tutorials, notebooks, and colab links available

    EasyTextGenerator

    • New module for text generation. GPT models are currently supported, other models may work but still experimental
    • Tutorials, notebooks, and colab links available

    Tutorials and Documentation

    • Documentation has been edited and updated to include additional features like the change in training frameworks and fine-tuning
    • The sequence classification tutorial is a good indicator of the direction we are going with the training and fine-tuning framework

    Notebooks and Colab

    • Easy to run notebooks and colab links aligned with the tutorials are available

    Bug fixes

    • Minor bug and implementation error fixes from flair upgrades
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(May 1, 2020)

  • v0.1.5(Apr 17, 2020)

    Updated to Transformers 2.8.0 which now includes the ELECTRA language model

    EasySummarizer and EasyTranslator Bug Fix #63

    • Address mini batch output format issue for language model heads for the summarization and translation task

    Tutorials and Workshop #64

    • Add the ODSC Timeline Generator notebooks along with colab links
    • Small touch ups in tutorial notebooks

    Documentation

    • Address missing model_name_or_path param in some easy modules
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Apr 2, 2020)

    Updated to Transformers 2.7.0 which includes the Bart and T5 Language Models!

    EasySummarizer #47

    • New module for summarizing documents. These support both the T5 and Bart pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersSummarizer

    EasyTranslator #49

    • New module for translating documents with T5 pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersTranslator

    Documentation and Tutorials #52

    • New Class API documentation for EasySummarizer and EasyTranslator
    • New tutorial guides, initial notebooks, and links to colab for the above as well
    • Readme provides quickstart samples that show examples from the notebooks #53

    Other

    • Dockerhub repo for adaptnlp-rest added here https://hub.docker.com/r/achangnovetta/adaptnlp-rest
    • Upgraded CircleCI allowing us to run #40
    • Added Nightly build #39
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Mar 6, 2020)

    Sequence Classification and Question Answering updates to integrate Hugging Face's public models.

    EasySequenceClassifier

    • Can now take Flair and Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersSequenceClassifier FlairSequenceClassifier

    EasyQuestionAnswering

    • Can now take Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersQuestionAnswering

    Documentation and Tutorials

    Documentation has been updated with the above implementations

    • Tutorials updated with better examples to convey changes
    • Class API docs updated
    • Tutorial notebooks updated
    • Colab notebooks better displayed on readme

    FastAPI Rest

    FastAPI updated to latest (0.52.0) FastAPI endpoints can now be stood up and deployed with any huggingface sequence classification or question answering model specified as an env var arg.

    Dependencies

    Transformers pinned for stable updates

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 19, 2020)

    AdaptNLP's first published release on github.

    Easy API:

    • EasyTokenTagger
    • EasySequenceClassifier
    • EasyWordEmbeddings
    • EasyStackedEmbeddings
    • EasyDocumentEmbeddings

    Training and Fine-tuning Interface

    • SequenceClassifierTrainer
    • LMFineTuner

    FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

    • adaptnlp/rest
    • configured to run any pretrained and custom trained flair/adaptnlp models
    • compatible with nvidia-docker for GPU use
    • AdaptNLP integration but loosely coupled

    Documentation

    • Documentation release with walk-through guides, tutorials, and Class API docs of the above
    • Built with mkdocs, material for mkdocs, and mkautodoc

    Tutorials

    • IPython/Colab Notebooks provided and updated to showcase AdaptNLP Modules

    Continuous Integration

    • CircleCI build and tests running successfully and minimally
    • Github workflow for pypi publishing added

    Formatting

    • Flake8 and Black adherence
    Source code(tar.gz)
    Source code(zip)
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 07, 2023
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Parallel WaveGAN implementation with Pytorch This repository provides UNOFFICIAL pytorch implementations of the following models: Parallel WaveGAN Mel

Tomoki Hayashi 1.2k Dec 23, 2022
CredData is a set of files including credentials in open source projects

CredData is a set of files including credentials in open source projects. CredData includes suspicious lines with manual review results and more information such as credential types for each suspicio

Samsung 19 Sep 07, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 03, 2023
Py65 65816 - Add support for the 65C816 to py65

Add support for the 65C816 to py65 Py65 (https://github.com/mnaberez/py65) is a

4 Jan 04, 2023
This is a modification of the OpenAI-CLIP repository of moein-shariatnia

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Sangwon Beak 2 Mar 04, 2022
189 Jan 02, 2023
DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i

5.6k Jan 03, 2023
A framework for cleaning Chinese dialog data

A framework for cleaning Chinese dialog data

Yida 136 Dec 20, 2022
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

3.4k Dec 27, 2022
Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

EBUS Coding Club 0 Apr 09, 2022
Write Python in Urdu - اردو میں کوڈ لکھیں

UrduPython Write simple Python in Urdu. How to Use Write Urdu code in سامپل۔پے The mappings are as following: "۔": ".", "،":

Saad A. Bazaz 26 Nov 27, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
CoSENT 比Sentence-BERT更有效的句向量方案

CoSENT 比Sentence-BERT更有效的句向量方案

苏剑林(Jianlin Su) 201 Dec 12, 2022
Conditional probing: measuring usable information beyond a baseline

Conditional probing: measuring usable information beyond a baseline

John Hewitt 20 Dec 15, 2022
CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title

buptlihang 84 Dec 28, 2022
STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

STonKGs STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs. This multimodal Transformer combin

STonKGs 27 Aug 11, 2022
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022