NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Last update: Dec 15, 2022

Related tags

Overview

NLPretext

Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web? 😫

Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post? 😥

NLPretext got you covered! 🚀

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

🔍 Quickly explore below our preprocessing pipelines and individual functions referential.

Default preprocessing pipeline
Custom preprocessing pipeline
Replacing phone numbers
Removing hashtags
Extracting emojis
Data augmentation

Cannot find what you were looking for? Feel free to open an issue.

Installation

This package has been tested on Python 3.6, 3.7 and 3.8.

We strongly advise you to do the remaining steps in a virtual environnement.

To install this library you just have to run the following command:

pip install nlpretext

This library uses Spacy as tokenizer. Current models supported are en_core_web_sm and fr_core_news_sm. If not installed, run the following commands:

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz

pip install https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.3.0/fr_core_news_sm-2.3.0.tar.gz

Preprocessing pipeline

Default pipeline

Need to preprocess your text data but no clue about what function to use and in which order? The default preprocessing pipeline got you covered:

from nlpretext import Preprocessor
text = "I just got the best dinner in my life @latourdargent !!! I  recommend 😀 #food #paris \n"
preprocessor = Preprocessor()
text = preprocessor.run(text)
print(text)
# "I just got the best dinner in my life !!! I recommend"

Create your custom pipeline

Another possibility is to create your custom pipeline if you know exactly what function to apply on your data, here's an example:

from nlpretext import Preprocessor
from nlpretext.basic.preprocess import (normalize_whitespace, remove_punct, remove_eol_characters,
remove_stopwords, lower_text)
from nlpretext.social.preprocess import remove_mentions, remove_hashtag, remove_emoji
text = "I just got the best dinner in my life @latourdargent !!! I  recommend 😀 #food #paris \n"
preprocessor = Preprocessor()
preprocessor.pipe(lower_text)
preprocessor.pipe(remove_mentions)
preprocessor.pipe(remove_hashtag)
preprocessor.pipe(remove_emoji)
preprocessor.pipe(remove_eol_characters)
preprocessor.pipe(remove_stopwords, args={'lang': 'en'})
preprocessor.pipe(remove_punct)
preprocessor.pipe(normalize_whitespace)
text = preprocessor.run(text)
print(text)
# "dinner life recommend"

Take a look at all the functions that are available here in the preprocess.py scripts in the different folders: basic, social, token.

Individual Functions

Replacing emails

from nlpretext.basic.preprocess import replace_emails
example = "I have forwarded this email to [email protected]"
example = replace_emails(example, replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"

Replacing phone numbers

from nlpretext.basic.preprocess import replace_phone_numbers
example = "My phone number is 0606060606"
example = replace_phone_numbers(example, country_to_detect=["FR"], replace_with="*PHONE*")
print(example)
# "My phone number is *PHONE*"

Removing Hashtags

from nlpretext.social.preprocess import remove_hashtag
example = "This restaurant was amazing #food #foodie #foodstagram #dinner"
example = remove_hashtag(example)
print(example)
# "This restaurant was amazing"

Extracting emojis

from nlpretext.social.preprocess import extract_emojis
example = "I take care of my skin 😀"
example = extract_emojis(example)
print(example)
# [':grinning_face:']

Data augmentation

The augmentation module helps you to generate new texts based on your given examples by modifying some words in the initial ones and to keep associated entities unchanged, if any, in the case of NER tasks. If you want words other than entities to remain unchanged, you can specify it within the stopwords argument. Modifications depend on the chosen method, the ones currently supported by the module are substitutions with synonyms using Wordnet or BERT from the nlpaug library.

from nlpretext.augmentation.text_augmentation import augment_text
example = "I want to buy a small black handbag please."
entities = [{'entity': 'Color', 'word': 'black', 'startCharIndex': 22, 'endCharIndex': 27}]
example = augment_text(example, method=”wordnet_synonym”, entities=entities)
print(example)
# "I need to buy a small black pocketbook please."

Make HTML documentation

In order to make the html Sphinx documentation, you need to run at the nlpretext root path: sphinx-apidoc -f nlpretext -o docs/ This will generate the .rst files. You can generate the doc with cd docs && make html

You can now open the file index.html located in the build folder.

Project Organization

├── LICENSE
├── VERSION
├── CONTRIBUTING.md     <- Contribution guidelines
├── README.md           <- The top-level README for developers using this project.
├── .github/workflows   <- Where the CI lives
├── datasets/external   <- Bash scripts to download external datasets
├── docs                <- Sphinx HTML documentation
├── nlpretext           <- Main Package. This is where the code lives
│   ├── preprocessor.py <- Main preprocessing script
│   ├── augmentation    <- Text augmentation script
│   ├── basic           <- Basic text preprocessing 
│   ├── social          <- Social text preprocessing
│   ├── token           <- Token text preprocessing
│   ├── _config         <- Where the configuration and constants live
│   └── _utils          <- Where preprocessing utils scripts lives
├── tests               <- Where the tests lives
├── setup.py            <- makes project pip installable (pip install -e .) so the package can be imported
├── requirements.txt    <- The requirements file for reproducing the analysis environment, e.g.
│                          generated with `pip freeze > requirements.txt`
└── pylintrc            <- The linting configuration file

Comments

Bump actions/cache from 2.1.6 to 3.2.1
Bumps actions/cache from 2.1.6 to 3.2.1.

Release notes

Sourced from actions/cache's releases.

v3.2.1

What's Changed

Release compression related changes for windows by @Phantsure in actions/cache#1039

Upgrade codeql to v2 by @Phantsure in actions/cache#1023

Full Changelog: https://github.com/actions/cache/compare/v3.2.0...v3.2.1

v3.2.0

What's Changed

fix wrong timeout env var key in README.md by @walterddr in actions/cache#959

Updated release doc with correct env variable by @kotewar in actions/cache#960

Create pull_request_template.md by @pdotl in actions/cache#963

Update README with clearer info about cache-hit and its value by @kotewar in actions/cache#961

Change datadog/squid to Ubuntu/squid in CI check by @bishal-pdMSFT in actions/cache#976

Add more details to version section in readme by @bishal-pdMSFT in actions/cache#971

Update hashFiles documentation reference by @asaf400 in actions/cache#979

Updated link for cache segment download info by @kotewar in actions/cache#986

Readme update for deleting caches by @t-dedah in actions/cache#981

Add oncall logic to assign issues and PRs by @vsvipul in actions/cache#997

Bump minimatch from 3.0.4 to 3.1.2 by @dependabot in actions/cache#998

Revert "Bump minimatch from 3.0.4 to 3.1.2" by @vsvipul in actions/cache#1005

Fix npm vulnerability by @Phantsure in actions/cache#1007

refactor: Use early return pattern to avoid nested conditions by @jongwooo in actions/cache#1013

Use cache in check-dist.yml by @jongwooo in actions/cache#1004

chore: Use built-in cache action to cache dependencies by @jongwooo in actions/cache#1014

Updated node example by @t-dedah in actions/cache#1008

Fix: Node npm doc example by @apascualm in actions/cache#1026

docs: fix an invalid link in workarounds.md by @teatimeguest in actions/cache#929

General Availability release for granular cache by @kotewar in actions/cache#1035 More details here on beta release.

New Contributors

@walterddr made their first contribution in actions/cache#959

@asaf400 made their first contribution in actions/cache#979

@jongwooo made their first contribution in actions/cache#1013

@apascualm made their first contribution in actions/cache#1026

@teatimeguest made their first contribution in actions/cache#929

Full Changelog: https://github.com/actions/cache/compare/v3...v3.2.0

v3.2.0-beta.1

What's Changed

Actions Cache Granular Control Implementation by @kotewar in actions/cache#1006

v3.1.0-beta.3

What's Changed

Bug fixes for bsdtar fallback, if gnutar not available, and gzip fallback, if cache saved using old cache action, on windows.

Full Changelog: https://github.com/actions/cache/compare/v3.1.0-beta.2...v3.1.0-beta.3

... (truncated)

Changelog

Sourced from actions/cache's changelog.

3.2.1

Update @actions/cache on windows to use gnu tar and zstd by default and fallback to bsdtar and zstd if gnu tar is not available. (issue)

Added support for fallback to gzip to restore old caches on windows.

Added logs for cache version in case of a cache miss.

Commits

c1a5de8 Upgrade codeql to v2 (#1023)

9b0be58 Release compression related changes for windows (#1039)

c17f4bf GA for granular cache (#1035)

ac25611 docs: fix an invalid link in workarounds.md (#929)

dc097e3 Update examples.md (#1026)

fb86cbf Updated node example (#1008)

a57932f Merge pull request #1014 from jongwooo/chore/use-built-in-cache-action

04b13ca chore: Use built-in cache action to cache dependencies

941bc71 Merge pull request #1004 from jongwooo/chore/use-cache-in-check-dist

08d8639 Merge branch 'main' into chore/use-cache-in-check-dist

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies github_actions
opened by dependabot[bot] 0
Bump python from 3.9.7-slim-buster to 3.11.1-slim-buster in /docker
Bumps python from 3.9.7-slim-buster to 3.11.1-slim-buster.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft docker dependencies
opened by dependabot[bot] 0
The current release is not functional as emoji lib has changed
🐛 Bug Report

🔬 How To Reproduce

Steps to reproduce the behavior:

install nlpretext from pip (1.1.0)

run from nlpretext._config import constants

Code sample

Environment

OS: macOS Silicon

Python version: 3.7, 3.8, 3.9

📈 Expected behavior

EMOJI_PATTERN = _emoji.get_emoji_regexp()

AttributeError: module 'emoji' has no attribute 'get_emoji_regexp'

bug
opened by Guillaume6606 1
Bump release-drafter/release-drafter from 5.15.0 to 5.21.1
Bumps release-drafter/release-drafter from 5.15.0 to 5.21.1.

Release notes

Sourced from release-drafter/release-drafter's releases.

v5.21.1

What's Changed

Dependency Updates

Address set-output deprecation (#1247) @NotMyFault

Full Changelog: https://github.com/release-drafter/release-drafter/compare/v5.21.0...v5.21.1

v5.21.0

What's Changed

New

fetch 100 labels for pull requests instead of 10 (#1220) @matoubidou

Full Changelog: https://github.com/release-drafter/release-drafter/compare/v5.20.1...v5.21.0

v5.20.1

What's Changed

Bug Fixes

Add missing inputs to action config (#1202) @gilbertsoft

Documentation

Add more comments about pull requests permission (#1187) @Kirade

Fix Vercel link (#1188) @shinshin86

Add permissions to README (#1132) @danyeaw

Dependency Updates

Bump eslint-plugin-unicorn from 42.0.0 to 43.0.2 (#1192) @dependabot

Bump node from af50279 to 4c8f734 (#1191) @dependabot

Bump node from 17.9.0-alpine to 18.7.0-alpine (#1190) @dependabot

Bump jest from 28.1.0 to 28.1.3 (#1182) @dependabot

Bump eslint from 8.16.0 to 8.20.0 (#1185) @dependabot

Bump nock from 13.2.4 to 13.2.9 (#1186) @dependabot

Bump probot from 12.2.4 to 12.2.5 (#1178) @dependabot

Bump eslint-plugin-prettier from 4.0.0 to 4.2.1 (#1176) @dependabot

Bump lint-staged from 13.0.0 to 13.0.3 (#1172) @dependabot

Bump prettier from 2.6.2 to 2.7.1 (#1166) @dependabot

Bump @actions/core from 1.8.2 to 1.9.0 (#1164) @dependabot

Bump lint-staged from 12.4.3 to 13.0.0 (#1156) @dependabot

Bump probot from 12.2.3 to 12.2.4 (#1155) @dependabot

Bump @vercel/ncc from 0.33.4 to 0.34.0 (#1151) @dependabot

... (truncated)

Commits

6df64e4 v5.21.1

26be07d Address set-output deprecation (#1247)

df69d58 v5.21.0

ecbbed9 fetch 100 labels for pull requests instead of 10 (#1220)

06a49bf v5.20.1

6e6a13c Add missing inputs to action config (#1202)

0e58cd4 Bump eslint-plugin-unicorn from 42.0.0 to 43.0.2 (#1192)

c3d9042 quote schema defaults that contain *

bd579b5 Bump node from af50279 to 4c8f734 (#1191)

c464263 Bump node from 17.9.0-alpine to 18.7.0-alpine (#1190)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies github_actions
opened by dependabot[bot] 0
Bump cloudpickle from 2.0.0 to 2.2.0
Bumps cloudpickle from 2.0.0 to 2.2.0.

Changelog

Sourced from cloudpickle's changelog.

2.2.0

Fix support of PyPy 3.8 and later. ([issue #455](cloudpipe/cloudpickle#455))

2.1.0

Support for pickling abc.abstractproperty, abc.abstractclassmethod, and abc.abstractstaticmethod. ([PR #450](cloudpipe/cloudpickle#450))

Support for pickling subclasses of generic classes. ([PR #448](cloudpipe/cloudpickle#448))

Support and CI configuration for Python 3.11. ([PR #467](cloudpipe/cloudpickle#467))

Support for the experimental nogil variant of CPython ([PR #470](cloudpipe/cloudpickle#470))

Commits

f31859b Release 2.2.0

23cbe15 FIX: Support PyPy > 3.7 (#480)

f5472e1 Fix for dis module is not yet available in 3.11b3 (#475)

8bbea3e compat: Import Pickler from "pickle" instead of "_pickle" (#469)

0006829 Install development version of dask in downstream tests (#472)

f926a04 Back to dev mode

d50bd11 Release 2.1.0

6a0e12d Improve compatibility with "nogil" Python and 3.11 (#470)

2fc334d Fix downstream CI (#471)

f758eb3 Fix compatibility with Python 3.11 (#467)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

draft dependencies python
opened by dependabot[bot] 0

Releases(1.1.0)

1.1.0(Sep 16, 2021)
What’s Changed

[FIX] Removed direct dependency and changed docker registry (#163) @Cedric-Magnan

[DOC] Updated method for spacy tokenizer installation (#159) @Cedric-Magnan

Feature/ignore stopwords (#157) @Guillaume6606

fix: display explicit error message when model not downloaded (#156) @benoitgoujon

Feature/dataloader (#152) @sachalasry-artefact

Hotfix/pylint (#151) @amaleelhamri

Fix/credits (#150) @rafaelleaygalenq

:busts_in_silhouette: List of contributors

@Cedric-Magnan, @Guillaume6606, @amaleelhamri, @benoitgoujon, @hugovasselin, @rafaelleaygalenq and @sachalasry-artefact
Source code(tar.gz)
Source code(zip)
1.0.3(Feb 18, 2021)

Update license MIT to Apache in PyPI
Source code(tar.gz)
Source code(zip)
nlpretext-1.0.2-py3-none-any.whl(131.91 KB)
nlpretext-1.0.2.tar.gz(275.42 KB)
1.0.1(Feb 18, 2021)
Readme fix

Long description add

Augmentation sphinx documentation fix

Source code(tar.gz)
Source code(zip)
nlpretext-1.0.1-py3-none-any.whl(131.90 KB)
nlpretext-1.0.1.tar.gz(275.33 KB)
1.0.0(Feb 18, 2021)
First release

Easy pipelines to clean text efficiently

Catalogue of preprocessing functions for different needs

Source code(tar.gz)
Source code(zip)
nlpretext-1.0.0-py3-none-any.whl(126.46 KB)
nlpretext-1.0.0.tar.gz(271.90 KB)

Owner

Artefact

GitHub Repository https://nlpretext.readthedocs.io/en/latest/

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

10 Apr 30, 2022

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

17 Nov 09, 2022

本插件是pcrjjc插件的重置版，可以独立于后端api运行

pcrjjc2 本插件是pcrjjc重置版，不需要使用其他后端api，但是需要自行配置客户端本项目基于AGPL v3协议开源，由于项目特殊性，禁止基于本项目的任何商业行为配置方法环境需求：.net framework 4.5及以上 jre8 别忘了装jre8 别忘了装jre8 别忘了装jre8

132 Dec 26, 2022

Residual2Vec: Debiasing graph embedding using random graphs

Residual2Vec: Debiasing graph embedding using random graphs This repository contains the code for S. Kojaku, J. Yoon, I. Constantino, and Y.-Y. Ahn, R

5 Oct 12, 2022

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

2 Sep 27, 2022

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

15k Jan 02, 2023

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

47 Sep 05, 2022

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

62 Oct 02, 2022

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

60 Sep 26, 2022

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

ERNIE Source code and dataset for "ERNIE: Enhanced Language Representation with Informative Entities" Reqirements: Pytorch=0.4.1 Python3 tqdm boto3 r

1.3k Dec 30, 2022

2021语言与智能技术竞赛：机器阅读理解任务

LICS2021 MRC 1. 项目&任务介绍本项目基于官方给定的baseline（DuReader-Checklist-BASELINE）进行二次改造，对整个代码框架做了简单的重构，对核心网络结构添加了注释，解耦了数据读取的模块，并添加了阈值确认的功能，一些小的细节也做了改进。本次任务为202

29 Dec 05, 2022

Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023

Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

5.8k Jan 04, 2023

Repositório da disciplina no semestre 2021-2

Avisos! Nenhum aviso! Compiladores 1 Este é o Git da disciplina Compiladores 1. Aqui ficará o material produzido em sala de aula assim como tarefas, w

6 May 13, 2022

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

70 Dec 12, 2022

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev

59 Jan 02, 2023

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

114 Dec 15, 2022

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

patterns-finder Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Ex

22 Dec 19, 2022

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

1.5k Dec 05, 2022

Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

203 Dec 02, 2022

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Related tags

Overview

NLPretext

Installation

Preprocessing pipeline

Default pipeline

Create your custom pipeline

Individual Functions

Replacing emails

Replacing phone numbers

Removing Hashtags

Extracting emojis

Data augmentation

Make HTML documentation

Project Organization

Comments

Bump actions/cache from 2.1.6 to 3.2.1

v3.2.1

What's Changed

v3.2.0

What's Changed

New Contributors

v3.2.0-beta.1

What's Changed

v3.1.0-beta.3

What's Changed

3.2.1

Bump python from 3.9.7-slim-buster to 3.11.1-slim-buster in /docker

The current release is not functional as emoji lib has changed

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

📈 Expected behavior

Bump release-drafter/release-drafter from 5.15.0 to 5.21.1

v5.21.1

What's Changed

Dependency Updates

v5.21.0

What's Changed

New

v5.20.1

What's Changed

Bug Fixes

Documentation

Dependency Updates

Bump cloudpickle from 2.0.0 to 2.2.0

2.2.0

2.1.0

Releases(1.1.0)

1.1.0(Sep 16, 2021)

What’s Changed

:busts_in_silhouette: List of contributors

1.0.3(Feb 18, 2021)

1.0.1(Feb 18, 2021)

1.0.0(Feb 18, 2021)

Owner

Artefact

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

本插件是pcrjjc插件的重置版，可以独立于后端api运行

Residual2Vec: Debiasing graph embedding using random graphs

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

2021语言与智能技术竞赛：机器阅读理解任务

Prompt tuning toolkit for GPT-2 and GPT-Neo

Open Source Neural Machine Translation in PyTorch

Repositório da disciplina no semestre 2021-2

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Pytorch implementation of Tacotron