Yaspeller Dictionary (Auto)builder

Usage

# this sample command generates `./yaspeller_report.json`
# yaspeller --report json --ignore-digits --ignore-text "'.*" --ignore-latin --only-errors --file-extensions ".md" --lang ru

python -m venv env
source env/bin/activate
pip install 
python src/dictionary.py yaspeller_report.json

Why

Yaspeller is nice, but there are too many anglicisms in a usual documentation. Normally you just want to ignore that, but there's the only possibility to add a regexp-array to ignore words.

This generates a array of dictionary words including all lexems for all cases like

[
    "[бБ]аг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[дД]ифф(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[кК]оммит(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[пП]атчинг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[рР]убист(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[сС]амоорганизованн(ого|ом|ому|ую|ые|ый|ым|ыми|ых)",
    "[тТ]икет(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "коммитить"
]

from yaspeller errors (in text format looking like)

Spelling check:
✗ www.ruby-lang.org/ru/community/ruby-core/index.md 130 ms
-----
Typos: 9
1. патчингом (36:27)
2. коммитить (68:32, suggest: комитет)
3. багах (75:15, suggest: богах, баках, бегах)
4. баги (89:24, suggest: багги)
5. баг (96:25)
6. тикет (107:14, suggest: этикет)
7. дифф (115:18)
8. коммиту (147:24, suggest: комету, комнату)
9. коммита (148:58, suggest: комета)
-----

Live example

Initially created for www.ruby-lang.org translations spellchecking

🤕 spelling exceptions builder for lazy people

Related tags

Overview

Yaspeller Dictionary (Auto)builder

Usage

Why

Live example

Owner

Vlad Bokov

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Codes to pre-train Japanese T5 models

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

A simple word search made in python

A script that automatically creates a branch name using google translation api and jira api

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Semi-automated vocabulary generation from semantic vector models

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Estimation of the CEFR complexity score of a given word, sentence or text.

中文无监督SimCSE Pytorch实现

Pretty-doc - Composable text objects with python

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Code for lyric-section-to-comment generation based on huggingface transformers.

Concept Modeling: Topic Modeling on Images and Text

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Nested Named Entity Recognition

A retro text-to-speech bot for Discord

Yodatranslator is a simple translator English to Yoda-language