Japanese synonym library

Last update: Dec 14, 2022

Related tags

Text Data & NLP chikkarpy

Overview

chikkarpy

chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar.

chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

単体でも同義語辞書の検索ツールとして利用できます。

利用方法 Usage

TL;DR

$ pip install chikkarpy

$ echo "閉店" | chikkarpy
閉店    クローズ,close,店仕舞い

Step 1. chikkarpyのインストール

$ pip install chikkarpy

Step 2. 使用方法

コマンドライン

$ echo "閉店" | chikkarpy
閉店    クローズ,close,店仕舞い

chikkarpyは入力された単語を見て一致する同義語のリストを返します。同義語辞書内の曖昧性フラグが1の見出し語をトリガーにすることはできません。出力はクエリ\t同義語リストの形式です。

$ chikkarpy search -h
usage: chikkarpy search [-h] [-d [file [file ...]]] [-ev] [-o file] [-v]
                        [file [file ...]]

Search synonyms

positional arguments:
  file                  text written in utf-8

optional arguments:
  -h, --help            show this help message and exit
  -d [file [file ...]]  synonym dictionary (default: system synonym
                        dictionary)
  -ev                   Enable verb and adjective synonyms.
  -o file               the output file
  -v, --version         print chikkarpy version

自分で用意したユーザー辞書を使いたい場合は-dで読み込むバイナリ辞書を指定できます。（バイナリ辞書のビルドは辞書の作成を参照してください。）複数辞書を読み込む場合は順番に注意してください。以下の場合，user2 > user > system の順で同義語を検索して見つかった時点で検索結果を返します。

chikkarpy -d system.dic user.dic user2.dic

また、出力はデフォルトで体言のみです。用言も出力したい場合は-evを有効にしてください。

$ echo "開放" | chikkarpy
開放	オープン,open
$ echo "開放" | chikkarpy -ev
開放	開け放す,開く,オープン,open

python ライブラリ

使用例

from chikkarpy import Chikkar
from chikkarpy.dictionarylib import Dictionary

chikkar = Chikkar()

system_dic = Dictionary("system.dic", False)
chikkar.add_dictionary(system_dic)

print(chikkar.find("閉店"))
# => ['クローズ', 'close', '店仕舞い']

print(chikkar.find("閉店", group_ids=[5])) # グループIDによる検索
# => ['クローズ', 'close', '店仕舞い']

print(chikkar.find("開放"))
# => ['オープン', 'open']

chikkar.enable_verb() # 用言の出力制御（デフォルトは体言のみ出力）
print(chikkar.find("開放"))
# => ['開け放す', '開く', 'オープン', 'open']

chikkar.add_dictionary()で複数の辞書を読み込ませる場合は順番に注意してください。最後に読み込んだ辞書を優先して検索します。

辞書の作成 Build a dictionary

新しく辞書を追加する場合は、利用前にバイナリ形式辞書の作成が必要です。 Before using new dictionary, you need to create a binary format dictionary.

$ chikkarpy build -i synonym_dict.csv -o system.dic

$ chikkarpy build -h
usage: chikkarpy build [-h] -i file [-o file] [-d string]

Build Synonym Dictionary

optional arguments:
  -h, --help  show this help message and exit
  -i file     dictionary file (csv)
  -o file     output file (default: synonym.dic)
  -d string   description comment to be embedded on dictionary

開発者向け

Code Format

scripts/lint.sh を実行して、コードが正しいフォーマットかを確認してください。

flake8 flake8-import-order flake8-builtins が必要です。

Test

scripts/test.sh を実行してテストしてください。

Contact

chikkarpyはWAP Tokushima Laboratory of AI and NLPによって開発されています。

開発者やユーザーの方々が質問したり議論するためのSlackワークスペースを用意しています。

https://sudachi-dev.slack.com/ (こちらから招待を受けてください)

You might also like...

Script to download some free japanese lessons in portuguse from NHK

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

55 Nov 17, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Comments

pip install does not work under SudachiPy 0.6.x environment / SudachiPy 0.6.x の環境下で pip install が通らない
temporary solution / 暫定的な解決方法

Install SudachiPy 0.5.4, then chikkarpy, then reinstall the latest version of SudachiPy. SudachiPy 0.5.4 をインストールしてから、chikkarpy をインストールし、その後 SudachiPy 最新版を再インストールする。

pip install sudachipy==0.5.4 --upgrade pip install sudachidict_core pip install chikkarpy pip install sudachipy --upgrade
opened by Nishihara-Daiki 1

chikkarpy has no attribute 'dictionarylib' in certain cases

case 1: raised ERROR if call chikkarpy.dictionarylib

$ pip install chikkarpy
$ python
>>> import chikkarpy
>>> chikkarpy.dictionarylib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'chikkarpy' has no attribute 'dictionarylib'

case 2: pass if use from chikkarpy import dictionarylib

$ pip install chikkarpy
$ python
>>> from chikkarpy import dictionarylib
>>> dictionarylib
<module 'chikkarpy.dictionarylib' from '/usr/local/lib/python3.7/dist-packages/chikkarpy/dictionarylib/__init__.py'>

case 3: pass if call chikkarpy.dictionarylib AFTER from chikkarpy import dictionarylib

$ pip install chikkarpy
$ python
>>> import chikkarpy
>>> from chikkarpy import dictionarylib
>>> chikkarpy.dictionarylib
<module 'chikkarpy.dictionarylib' from '/usr/local/lib/python3.7/dist-packages/chikkarpy/dictionarylib/__init__.py'>

opened by Nishihara-Daiki 0

Releases(v0.1.1)

v0.1.1(Feb 7, 2022)

Fixed https://github.com/WorksApplications/chikkarpy/issues/8
Source code(tar.gz)
Source code(zip)
v0.1.0(May 24, 2021)

First release

chikkarpy is a Python version of chikkar. https://github.com/WorksApplications/chikkarpy
Source code(tar.gz)
Source code(zip)

Japanese synonym library

Related tags

Overview

chikkarpy

利用方法 Usage

TL;DR

Step 1. chikkarpyのインストール

Step 2. 使用方法

コマンドライン

python ライブラリ

辞書の作成 Build a dictionary

開発者向け

Code Format

Test

Contact

You might also like...

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

aMLP Transformer Model for Japanese

A Japanese tokenizer based on recurrent neural networks

This repository has a implementations of data augmentation for NLP for Japanese.

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Comments

pip install does not work under SudachiPy 0.6.x environment / SudachiPy 0.6.x の環境下で pip install が通らない

temporary solution / 暫定的な解決方法

chikkarpy has no attribute 'dictionarylib' in certain cases

Releases(v0.1.1)

v0.1.1(Feb 7, 2022)

v0.1.0(May 24, 2021)

Owner

Works Applications

Malware-Related Sentence Classification

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

Levenshtein and Hamming distance computation

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Translation for Trilium Notes. Trilium Notes 中文版.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

txtai: Build AI-powered semantic search applications in Go

Edge-Augmented Graph Transformer

SummerTime - Text Summarization Toolkit for Non-experts

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Spam filtering made easy for you

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.