Augmenty is an augmentation library based on spaCy for augmenting texts.

Last update: Dec 29, 2022

Overview

Augmenty: The cherry on top of your NLP pipeline

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)

spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides	Guides and instruction on how to use augmenty and its features.
📰 News and changelog	New additions, changes and version history.
🎛 API References	The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters	Contains a full list of current augmenters in augmenty.
😎 Demo	A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports	GitHub Issue Tracker
🎁 Feature Requests & Ideas	GitHub Issue Tracker
👩‍💻 Usage Questions	GitHub Discussions
🗯 General Discussion	GitHub Discussions
🍒 Adding an Augmenter	Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System	Status
Ubuntu/Linux (Latest)
MacOS (Catalina)
Windows (Latest)

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.

Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.

🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}

Comments

Use of augmenty with spacy config files for training

I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

Which page or section is this issue related to?

https://spacy.io/usage/training#data-augmentation-custom

https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation
documentation

opened by Giles-Billenness 3

Added sententence_subset.v1 augmenter following #48

Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

import augmenty
import spacy
nlp = spacy.load("en_core_web_sm")

# four sentences
text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
for obtaining higher performance on limited data. You can also use it to see how
robust your model is to changes. It will sample subset of the paragraf."""
docs = nlp(text)

augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)

list(augmenty.texts(texts, augmenter, nlp))

Missing:

[ ] Add tests
[ ] Add documentation

opened by KennethEnevoldsen 3

Paragraf subset augmenter

A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

Example - sentence level

import augmenty
import spacy
nlp = spacy.load("en_core_web_sm")

# four sentences
texts = [
    "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
    "for obtaining higher performance on limited data. You can also use it to see how "
    "robust your model is to changes. It will sample subset of the paragraf.",
]
docs = nlp(texts)

augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)

list(augmenty.texts(texts, augmenter, nlp))

Example outputs:

The first section:

Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
for obtaining higher performance on limited data.

The middle section:

Augmentation is a wonderful tool for obtaining higher performance on limited data. 
You can also use it to see how robust your model is to changes.

The middle section:

You can also use it to see how robust your model is to changes. It will sample subset 
of the paragraf.

Additional thoughts:

Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

additional augmenter

opened by martincjespersen 3

:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26
Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

Commits

8856b4a Elapsed time in minutes (#63)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 2
:arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0
Updates the requirements on pydantic to permit the latest version.

Release notes

Sourced from pydantic's releases.

v1.9.0 (2021-12-31)

Thank you to pydantic's sponsors: @sthagen, @timdrijvers, @toinbis, @koxudaxi, @ginomempin, @primer-io, @and-semakin, @westonsteimel, @reillysiemens, @es3n1n, @jokull, @JonasKs, @Rehket, @corleyma, @daddycocoaman, @hardbyte, @datarootsio, @jodal, @aminalaee, @rafsaf, @jqueguiner, @chdsbd, @kevinalh, @Mazyod, @grillazz, @JonasKs, @simw, @leynier, @xfenix for their kind support.

Highlights

add python 3.10 support, #2885 by @PrettyWood

Discriminated unions, #619 by @PrettyWood

Config.smart_union for better union logic, #2092 by @PrettyWood

Binaries for Macos M1 CPUs, #3498 by @samuelcolvin

Complex types can be set via nested environment variables, e.g. foo___bar, #3159 by @Air-Mark

add a dark mode to pydantic documentation, #2913 by @gbdlin

Add support for autocomplete in VS Code via __dataclass_transform__, #2721 by @tiangolo

Add "exclude" as a field parameter so that it can be configured using model config, #660 by @daviskirk

v1.9.0 (2021-12-31) Changes

Apply update_forward_refs to Config.json_encodes prevent name clashes in types defined via strings, #3583 by @samuelcolvin

Extend pydantic's mypy plugin to support mypy versions 0.910, 0.920, 0.921 & 0.930, #3573 & #3594 by @PrettyWood, @christianbundy, @samuelcolvin

v1.9.0a2 (2021-12-24) Changes

support generic models with discriminated union, #3551 by @PrettyWood

keep old behaviour of json() by default, #3542 by @PrettyWood

Removed typing-only __root__ attribute from BaseModel, #3540 by @layday

Build Python 3.10 wheels, #3539 by @mbachry

Fix display of extra fields with model __repr__, #3234 by @cocolman

models copied via Config.copy_on_model_validation always have all fields, #3201 by @PrettyWood

nested ORM from nested dictionaries, #3182 by @PrettyWood

fix link to discriminated union section by @PrettyWood

v1.9.0a1 (2021-12-18) Changes

Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @tiangolo

Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @samuelcolvin

Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @tharradine

When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @jasujm

Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @uriyyo

Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @BvB93

Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @michaelrios28

Add AmqpDsn class, #3254 by @kludex

Always use Enum value as default in generated JSON schema, #3190 by @joaommartins

Add support for Mypy 0.920, #3175 by @christianbundy

validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @PrettyWood

... (truncated)

Changelog

Sourced from pydantic's changelog.

v1.9.0 (2021-12-31)

Thank you to pydantic's sponsors: @sthagen, @timdrijvers, @toinbis, @koxudaxi, @ginomempin, @primer-io, @and-semakin, @westonsteimel, @reillysiemens, @es3n1n, @jokull, @JonasKs, @Rehket, @corleyma, @daddycocoaman, @hardbyte, @datarootsio, @jodal, @aminalaee, @rafsaf, @jqueguiner, @chdsbd, @kevinalh, @Mazyod, @grillazz, @JonasKs, @simw, @leynier, @xfenix for their kind support.

Highlights

add python 3.10 support, #2885 by @PrettyWood

Discriminated unions, #619 by @PrettyWood

Config.smart_union for better union logic, #2092 by @PrettyWood

Binaries for Macos M1 CPUs, #3498 by @samuelcolvin

Complex types can be set via nested environment variables, e.g. foo___bar, #3159 by @Air-Mark

add a dark mode to pydantic documentation, #2913 by @gbdlin

Add support for autocomplete in VS Code via __dataclass_transform__, #2721 by @tiangolo

Add "exclude" as a field parameter so that it can be configured using model config, #660 by @daviskirk

v1.9.0 (2021-12-31) Changes

Apply update_forward_refs to Config.json_encodes prevent name clashes in types defined via strings, #3583 by @samuelcolvin

Extend pydantic's mypy plugin to support mypy versions 0.910, 0.920, 0.921 & 0.930, #3573 & #3594 by @PrettyWood, @christianbundy, @samuelcolvin

v1.9.0a2 (2021-12-24) Changes

support generic models with discriminated union, #3551 by @PrettyWood

keep old behaviour of json() by default, #3542 by @PrettyWood

Removed typing-only __root__ attribute from BaseModel, #3540 by @layday

Build Python 3.10 wheels, #3539 by @mbachry

Fix display of extra fields with model __repr__, #3234 by @cocolman

models copied via Config.copy_on_model_validation always have all fields, #3201 by @PrettyWood

nested ORM from nested dictionaries, #3182 by @PrettyWood

fix link to discriminated union section by @PrettyWood

v1.9.0a1 (2021-12-18) Changes

Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @tiangolo

Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @samuelcolvin

Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @tharradine

When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @jasujm

Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @uriyyo

Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @BvB93

Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @michaelrios28

Add AmqpDsn class, #3254 by @kludex

Always use Enum value as default in generated JSON schema, #3190 by @joaommartins

Add support for Mypy 0.920, #3175 by @christianbundy

... (truncated)

Commits

fbf8002 prepare for v1.9.0 release, extra change

5406423 prepare for v1.9.0 release

87da9ac apply update_forward_refs to json_encoders (#3595)

6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)

8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)

2d3d266 remove failing release step

ef46789 add step to upload pypi files to release

5d6f48c prepare for v1.9.0a2

e882277 fix: support generic models with discriminated union (#3551)

edad0db fix: keep old behaviour of json() by default (#3542)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40
Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

Release notes

Sourced from MishaKav/pytest-coverage-comment's releases.

Support GitHub enterprise urls

What's Changed

Minor readme improvements by @AlexanderLanin in MishaKav/pytest-coverage-comment#100

Support GitHub enterprise urls by @jbcumming in MishaKav/pytest-coverage-comment#101

New Contributors

@AlexanderLanin made their first contribution in MishaKav/pytest-coverage-comment#100

Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

Changelog

Sourced from MishaKav/pytest-coverage-comment's changelog.

Pytest Coverage Comment 1.1.40

Release Date: 2022-12-03

Changes

Support for url for github enterprise repositories, thanks to @jbcumming for contribution

Minor readme improvements, thanks to @AlexanderLanin for contribution

Commits

b2577f1 Support GitHub enterprise urls (#102)

072a74d Minor readme improvements (#100)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31
Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

Release notes

Sourced from MishaKav/pytest-coverage-comment's releases.

Remove link on badge

add option to remove link on badge

Commits

7c2f420 Remove link on badge (#76)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0
Updates the requirements on streamlit to permit the latest version.

Commits

b6429b6 Fix linting

f4b2051 Up version to 1.11.1

80d9979 Ignore component requests outside of the component root

4a04eef Replace legacy app URLs in docs with custom subdomains (#4959)

27c29ac Up version to 1.11.0

03babac Test that GitRepo can handle import failures (#4942)

4c39606 Fix table overflow styling (#4934)

26de600 Fix issue with wrongly applied colors with Pandas styler (#4940)

ad4547f Fix widgets overwrites from short to long-hand props (#4935)

3809637 Add gap param to st.columns (#4887)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 1
:arrow_up: Bump actions/setup-python from 3 to 4.1.0
Bumps actions/setup-python from 3 to 4.1.0.

Release notes

Sourced from actions/setup-python's releases.

v4.1.0

In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

Update-environment input

- name: setup-python 3.9 uses: actions/[email protected] with: python-version: 3.9 update-environment: false

Besides, we added such changes as:

Allow python-version-file to be a relative path: actions/setup-python#431

Added new environment variables for Cmake: actions/setup-python#440

Updated error message for resolveVersion: actions/setup-python#450

Assign default value of AGENT_TOOLSDIRECTORY if not set: actions/setup-python#394

v4.0.0

What's Changed

Support for python-version-file input: #336

Example of usage:

- uses: actions/[email protected] with: python-version-file: '.python-version' # Read python version from a file - run: python my_script.py

There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

Use pypyX.Y for PyPy python-version input: #349

Example of usage:

- uses: actions/[email protected] with: python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility - run: python my_script.py

RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

Bugfix: create missing pypyX.Y symlinks: #347

PKG_CONFIG_PATH environment variable: #400

Added python-path output: #405

... (truncated)

Commits

c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)

0ad0f6a Merge pull request #452 from mayeut/fix-env

f0bcf8b Merge pull request #456 from akx/patch-1

af97157 doc: Add multiple wildcards example to readme

364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default

782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix

2c9de4e Remove duplicate code introduced in #440

412091c Fix tests for update-environment==false

78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version

96f494e trigger checks

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Bump actions/setup-python from 3 to 4
Bumps actions/setup-python from 3 to 4.

Release notes

Sourced from actions/setup-python's releases.

v4.0.0

What's Changed

Support for python-version-file input: #336

Example of usage:

- uses: actions/[email protected] with: python-version-file: '.python-version' # Read python version from a file - run: python my_script.py

There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

Use pypyX.Y for PyPy python-version input: #349

Example of usage:

- uses: actions/[email protected] with: python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility - run: python my_script.py

RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

Bugfix: create missing pypyX.Y symlinks: #347

PKG_CONFIG_PATH environment variable: #400

Added python-path output: #405 python-path output contains Python executable path.

Updated zeit/ncc to vercel/ncc package: #393

Bugfix: fixed output for prerelease version of poetry: #409

Made pythonLocation environment variable consistent for Python and PyPy: #418

Bugfix for 3.x-dev syntax: #417

Other improvements: #318 #396 #384 #387 #388

Update actions/cache version to 2.0.2

In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

Add "cache-hit" output and fix "python-version" output for PyPy

This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

... (truncated)

Commits

d09bd5e fix: 3.x-dev can install a 3.y version (#417)

f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)

53e1529 add support for python-version-file (#336)

3f82819 Fix output for prerelease version of poetry (#409)

397252c Update zeit/ncc to vercel/ncc (#393)

de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test

22c6af9 Change PyPy version to rebuild cache

081a3cf Merge pull request #405 from mayeut/interpreter-path

ff70656 feature: add a python-path output

fff15a2 Use pypyX.Y for PyPy python-version input (#349)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 1
:arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0
Updates the requirements on streamlit to permit the latest version.

Commits

ecd5428 Up version to 1.9.2

c02c7c1 Make typing-extensions and unconditional dependency (#4697)

5c065b2 Strip surrounding quotes on RC version

1c7a366 Fix shell quoting

a7bc838 Subshell with no output returns "null" not ""

88ebeae Up version to 1.9.1

03958a1 Pin lower version of protobuf (#4783)

f9cef45 Release process fixes (#4753)

c4bea5d Release 1.9.0 (#4673)

27ff5c2 Add more type annotations (#4657)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 1
Sample fake entities for entity augmenter using Faker package

Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.
enhancement help wanted

opened by martincjespersen 1

implement an oversampling function

Augmentation can be used to oversample a category.

Imagined usage would look something like this:

aug = augmenty.load(...)

def is_positive(example):
    """return true if the example contains an entity"""
    if example.y.cats["positive"] == 1:
        return True
    return False

upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)

enhancement

opened by KennethEnevoldsen 0

Back translation augmentation

Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.
additional augmenter

opened by martincjespersen 1
List of potentially new augmenters
The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

A variation of existing augmenters:

[ ] #9

[ ] #8

[ ] Common Danish spelling errors lookups

[ ] Danish synonym augmenters using lookups

[ ] Close Homophones Swap

[ ] Geonames augmentation

[ ] leet augmentation

[ ] Multilingual Lexicon Perturbation

[ ] Character duplication augmentation

[ ] American british augmentation

New augmenters

[x] #5

[x] #6

[ ] Butter finger augmentation

[ ] Date format

[ ] Contractions and Expansions Perturbation

[ ] gender swap

[ ] Appended word soup, Adds a random sequence as to the end of sentence

[ ] sentence shuffle augmenter

[ ] conditional token replace augmenter

[ ] replace numerical

[ ] Emojis --> Emoticons augmentation

[ ] conditional string replace

[ ] german ss -> ß

[ ] punct augmentation

[ ] Causal Negation & Strengthen

[ ] Emojify augmentation

[ ] tense augmentation

[ ] OCR Augmentation

[ ] Antonym augmentation

[ ] tfidf augmentation

[x] #25

[ ] sentence swap

Batch augmenters

[ ] Backtranslation e.g. based on this

[ ] Neural paraphraser

[ ] MLM augmentation

[ ] Summarize article by abstractive summarization augmentation

A combination of existing augmenters

[ ] EDA augmenter following the EDA paper

additional augmenter
opened by KennethEnevoldsen 0

Releases(v1.0.1)

v1.0.1(Jun 21, 2022)
Version

What's Changed

Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50

Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

Documentation updates

added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85

Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

New Contributors

@dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46

@koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51

@martincjespersen

Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1
Source code(tar.gz)
Source code(zip)
v.0.0.12(Feb 7, 2022)
0.0.12 (03/08/21)

Many bugfixes

Added a few more augmenters

Notable updates to the documentation of the package

0.0.1 (03/08/21)

First version of augmenty launches 🎉

with more than 15 highly customizable augmenters,

A high-quality code-base (coverage of 96% and a codefactor A),

and utilities for easy application of augmenters to strings and spaCy Docs.

Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12
Source code(tar.gz)
Source code(zip)

Owner

Kenneth Enevoldsen

Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre

GitHub Repository

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

20 May 17, 2022

Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

47 Jul 26, 2022

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

RuCLIPtiny Zero-shot image classification model for Russian language RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network

26 Sep 20, 2022

SinglepassTextCluster, an TextCluster tools based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individual real-time corpus cluster task。基于single-pass算法思想的自动文本聚类小组件，内置tfidf和doc2vec两种文本向量方法，可自动输出聚类数目、类簇文档集合和簇类大小，用于自有实时数据的聚类任务。

项目的背景 SinglepassTextCluster, an TextCluster tool based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individ

34 Dec 18, 2022

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

SITT The repo contains official PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation. Authors: Boyi Li Yin Cui T

52 Jan 05, 2023

Anuvada: Interpretable Models for NLP using PyTorch

Anuvada: Interpretable Models for NLP using PyTorch So, you want to know why your classifier arrived at a particular decision or why your flashy new d

102 Oct 01, 2022

A simple implementation of N-gram language model.

About A simple implementation of N-gram language model. Requirements numpy Data preparation Corpus Training data for the N-gram model, a text file lik

4 Nov 24, 2021

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Rule-Based-Classification-in-a-Banking-Case. A CRM department in a local bank works on classify their lost customers with their past datas. So they wa

4 Mar 20, 2022

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

BERT-for-Surprisal Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings

7 Dec 05, 2022

Simple NLP based project without any use of AI

1 Apr 26, 2022

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

rJAM splitscreen message reader for MysticBBS A46+

4 Nov 22, 2022

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

Protein Language Model

ProteinLM We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing P

77 Dec 27, 2022

Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

287 Dec 14, 2022

NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

1 Jan 13, 2022

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

1 Apr 03, 2022

Augmenty is an augmentation library based on spaCy for augmenting texts.

Related tags

Overview

Augmenty: The cherry on top of your NLP pipeline

🔧 Installation

🍒 Simple Example

📖 Documentation

💬 Where to ask questions

🤔 FAQ

🎓 Citing this work

Comments

Which page or section is this issue related to?

Example - sentence level

Example outputs:

Additional thoughts:

v1.9.0 (2021-12-31)

Highlights

v1.9.0 (2021-12-31) Changes

v1.9.0a2 (2021-12-24) Changes

v1.9.0a1 (2021-12-18) Changes

v1.9.0 (2021-12-31)

Highlights

v1.9.0 (2021-12-31) Changes

v1.9.0a2 (2021-12-24) Changes

v1.9.0a1 (2021-12-18) Changes

Support GitHub enterprise urls

What's Changed

New Contributors

Changes

Remove link on badge

v4.1.0

Update-environment input

v4.0.0

What's Changed

v4.0.0

What's Changed

Update actions/cache version to 2.0.2

Add "cache-hit" output and fix "python-version" output for PyPy

A variation of existing augmenters:

New augmenters

Batch augmenters

A combination of existing augmenters

Releases(v1.0.1)

v1.0.1(Jun 21, 2022)

What's Changed

Documentation updates

New Contributors

v.0.0.12(Feb 7, 2022)

Owner

Kenneth Enevoldsen

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Implementation of Multistream Transformers in Pytorch

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

Anuvada: Interpretable Models for NLP using PyTorch

A simple implementation of N-gram language model.

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

Simple NLP based project without any use of AI

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Protein Language Model

Skipgram Negative Sampling in PyTorch

NLP Overview

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

A Fast Command Analyser based on Dict and Pydantic

Watson Natural Language Understanding and Knowledge Studio

This repository contains examples of Task-Informed Meta-Learning