Augmenty is an augmentation library based on spaCy for augmenting texts.

Overview

Augmenty: The cherry on top of your NLP pipeline

PyPI version python version Code style: black github actions pytest github actions docs github coverage CodeFactor Streamlit App pip downloads

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)
spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides Guides and instruction on how to use augmenty and its features.
📰 News and changelog New additions, changes and version history.
🎛 API References The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters Contains a full list of current augmenters in augmenty.
😎 Demo A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions
🍒 Adding an Augmenter Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System Status
Ubuntu/Linux (Latest) github actions pytest ubuntu
MacOS (Catalina) github actions pytest catalina
Windows (Latest) github actions pytest windows

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.


Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.


🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}
Comments
  • Use of augmenty with spacy config files for training

    Use of augmenty with spacy config files for training

    I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

    apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

    image

    Which page or section is this issue related to?

    https://spacy.io/usage/training#data-augmentation-custom

    https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

    documentation 
    opened by Giles-Billenness 3
  • Added sententence_subset.v1 augmenter following #48

    Added sententence_subset.v1 augmenter following #48

    Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
    for obtaining higher performance on limited data. You can also use it to see how
    robust your model is to changes. It will sample subset of the paragraf."""
    docs = nlp(text)
    
    augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Missing:

    • [ ] Add tests
    • [ ] Add documentation
    opened by KennethEnevoldsen 3
  • Paragraf subset augmenter

    Paragraf subset augmenter

    A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

    Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

    Example - sentence level

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    texts = [
        "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
        "for obtaining higher performance on limited data. You can also use it to see how "
        "robust your model is to changes. It will sample subset of the paragraf.",
    ]
    docs = nlp(texts)
    
    augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Example outputs:

    The first section:

    Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
    for obtaining higher performance on limited data.
    

    The middle section:

    Augmentation is a wonderful tool for obtaining higher performance on limited data. 
    You can also use it to see how robust your model is to changes.
    

    The middle section:

    You can also use it to see how robust your model is to changes. It will sample subset 
    of the paragraf.
    

    Additional thoughts:

    Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

    additional augmenter 
    opened by martincjespersen 3
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    Updates the requirements on pydantic to permit the latest version.

    Release notes

    Sourced from pydantic's releases.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy
    • validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @​PrettyWood

    ... (truncated)

    Changelog

    Sourced from pydantic's changelog.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy

    ... (truncated)

    Commits
    • fbf8002 prepare for v1.9.0 release, extra change
    • 5406423 prepare for v1.9.0 release
    • 87da9ac apply update_forward_refs to json_encoders (#3595)
    • 6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)
    • 8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)
    • 2d3d266 remove failing release step
    • ef46789 add step to upload pypi files to release
    • 5d6f48c prepare for v1.9.0a2
    • e882277 fix: support generic models with discriminated union (#3551)
    • edad0db fix: keep old behaviour of json() by default (#3542)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 2
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Support GitHub enterprise urls

    What's Changed

    New Contributors

    Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

    Changelog

    Sourced from MishaKav/pytest-coverage-comment's changelog.

    Pytest Coverage Comment 1.1.40

    Release Date: 2022-12-03

    Changes

    • Support for url for github enterprise repositories, thanks to @​jbcumming for contribution
    • Minor readme improvements, thanks to @​AlexanderLanin for contribution
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Remove link on badge

    add option to remove link on badge

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    Bumps actions/setup-python from 3 to 4.1.0.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.1.0

    In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

    Update-environment input

        - name: setup-python 3.9
          uses: actions/[email protected]
          with:
            python-version: 3.9
            update-environment: false
    

    Besides, we added such changes as:

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405

    ... (truncated)

    Commits
    • c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)
    • 0ad0f6a Merge pull request #452 from mayeut/fix-env
    • f0bcf8b Merge pull request #456 from akx/patch-1
    • af97157 doc: Add multiple wildcards example to readme
    • 364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default
    • 782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix
    • 2c9de4e Remove duplicate code introduced in #440
    • 412091c Fix tests for update-environment==false
    • 78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version
    • 96f494e trigger checks
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4

    :arrow_up: Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • Sample fake entities for entity augmenter using Faker package

    Sample fake entities for entity augmenter using Faker package

    Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.

    enhancement help wanted 
    opened by martincjespersen 1
  • implement an oversampling function

    implement an oversampling function

    Augmentation can be used to oversample a category.

    Imagined usage would look something like this:

    aug = augmenty.load(...)
    
    def is_positive(example):
        """return true if the example contains an entity"""
        if example.y.cats["positive"] == 1:
            return True
        return False
    
    upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)
    
    enhancement 
    opened by KennethEnevoldsen 0
  • Back translation augmentation

    Back translation augmentation

    Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

    Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

    Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

    English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

    Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.

    additional augmenter 
    opened by martincjespersen 1
  • List of potentially new augmenters

    List of potentially new augmenters

    The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

    A variation of existing augmenters:

    New augmenters

    Batch augmenters

    A combination of existing augmenters

    • [ ] EDA augmenter following the EDA paper
    additional augmenter 
    opened by KennethEnevoldsen 0
Releases(v1.0.1)
  • v1.0.1(Jun 21, 2022)

    Version

    What's Changed

    • Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50
    • Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

    Documentation updates

    • added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85
    • Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

    New Contributors

    • @dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46
    • @koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51
    • @martincjespersen

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v.0.0.12(Feb 7, 2022)

    0.0.12 (03/08/21)

    • Many bugfixes
    • Added a few more augmenters
    • Notable updates to the documentation of the package

    0.0.1 (03/08/21)

    • First version of augmenty launches 🎉
      • with more than 15 highly customizable augmenters,
      • A high-quality code-base (coverage of 96% and a codefactor A),
      • and utilities for easy application of augmenters to strings and spaCy Docs.
      • Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12

    Source code(tar.gz)
    Source code(zip)
Owner
Kenneth Enevoldsen
Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre
Kenneth Enevoldsen
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Jan 08, 2023
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022
The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Karibash 3 Aug 19, 2022
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

EQT 21 Dec 15, 2022
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 21.2k Dec 30, 2022
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022
A desktop GUI providing an audio interface for GPT3.

Jabberwocky neil_degrasse_tyson_with_audio.mp4 Project Description This GUI provides an audio interface to GPT-3. My main goal was to provide a conven

16 Nov 27, 2022
Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Stat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP This is the first course from our trio courses: Statistics Foundatio

Omid Safarzadeh 83 Dec 29, 2022
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

LipGAN Generate realistic talking faces for any human speech and face identity. [Paper] | [Project Page] | [Demonstration Video] Important Update: A n

Rudrabha Mukhopadhyay 438 Dec 31, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

Axel Springer Ideas Engineering GmbH 231 Nov 18, 2022
Repositório da disciplina no semestre 2021-2

Avisos! Nenhum aviso! Compiladores 1 Este é o Git da disciplina Compiladores 1. Aqui ficará o material produzido em sala de aula assim como tarefas, w

6 May 13, 2022
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

Dat Quoc Nguyen 152 Sep 02, 2022
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

18 Nov 17, 2022
nlp基础任务

NLP算法 说明 此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务,涉及到22个相关的模型算法。 框架结构 文件结构 all_models ├── Base_line │   ├── __init__.py │   ├── base_data_process.

zuxinqi 23 Sep 22, 2022
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022