jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Last update: Jan 06, 2023

Overview

jel: Japanese Entity Linker

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Usage

Currently, link and question methods are supported.

`el.link`

This returnes named entity and its candidate ones from Wikipedia titles.

from jel import EntityLinker
el = EntityLinker()

el.link('今日は東京都のマックにアップルを買いに行き、スティーブジョブスとドナルドに会い、堀田区に引っ越した。')
>> [
    {
        "text": "東京都",
        "label": "GPE",
        "span": [
            3,
            6
        ],
        "predicted_normalized_entities": [
            [
                "東京都庁",
                0.1084
            ],
            [
                "東京",
                0.0633
            ],
            [
                "国家地方警察東京都本部",
                0.0604
            ],
            [
                "東京都",
                0.0598
            ],
            ...
        ]
    },
    {
        "text": "アップル",
        "label": "ORG",
        "span": [
            11,
            15
        ],
        "predicted_normalized_entities": [
            [
                "アップル",
                0.2986
            ],
            [
                "アップル インコーポレイテッド",
                0.1792
            ],
            …
        ]
    }

`el.question`

This returnes candidate entity for any question from Wikipedia titles.

>>> linker.question('日本の総理大臣は？')
[('菅内閣', 0.05791765857101555), ('枢密院', 0.05592481946602986), ('党', 0.05430194711042564), ('総選挙', 0.052795400668513175)]

Setup

$ pip install jel
$ python -m spacy download ja_core_news_md

Run as API

$ uvicorn jel.api.server:app --reload --port 8000 --host 0.0.0.0 --log-level trace

Example

# link
$ curl localhost:8000/link -X POST -H "Content-Type: application/json" \
    -d '{"sentence": "日本の総理は菅総理だ。"}'

# question
$ curl localhost:8000/question -X POST -H "Content-Type: application/json" \
    -d '{"sentence": "日本で有名な総理は？"}

Test

$ python pytest

Notes

faiss==1.5.3 from pip causes error _swigfaiss.
To solve this, see this issue.

LICENSE

Apache 2.0 License.

CITATION

@INPROCEEDINGS{manabe2019chive,
    author    = {真鍋陽俊, 岡照晃, 海川祥毅, 髙岡一馬, 内田佳孝, 浅原正幸},
    title     = {複数粒度の分割結果に基づく日本語単語分散表現},
    booktitle = "言語処理学会第25回年次大会(NLP2019)",
    year      = "2019",
    pages     = "NLP2019-P8-5",
    publisher = "言語処理学会",
}

Japanese synonym library

chikkarpy chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar. chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

48 Dec 14, 2022

AllenNLP integration for Shiba: Japanese CANINE model

Allennlp Integration for Shiba allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model. SHIBA is an approximate re

12 Feb 16, 2022

Codes to pre-train Japanese T5 models

t5-japanese Codes to pre-train a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts. The model is available at https://hug

37 Dec 25, 2022

Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

5 Aug 25, 2022

Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

216 Dec 22, 2022

Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

2 Jan 6, 2022

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。詳しい解説は、こちらの記事などを参考にしてください。この

13 Aug 11, 2022

Comments

ModuleNotFoundError

Traceback (most recent call last):
  File "scripts/biencoder_training_check.py", line 1, in <module>
    from jel.biencoder.train import biencoder_training
ModuleNotFoundError: No module named 'jel'

opened by izuna385 1

Separate Estimation Model and DB

Because the inference model and knowledge base are currently loaded together, it takes 30 seconds to load the model. To prevent this, we will separate the DB into a separate container.

opened by izuna385 0

Releases(v0.1.1)

v0.1.1(May 29, 2021)

First release.
Source code(tar.gz)
Source code(zip)

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Related tags

Overview

jel: Japanese Entity Linker

Usage

el.link

el.question

Setup

Run as API

Example

Test

Notes

LICENSE

CITATION

You might also like...

Japanese synonym library

AllenNLP integration for Shiba: Japanese CANINE model

Codes to pre-train Japanese T5 models

Auto translate textbox from Japanese to English or Indonesia

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

aMLP Transformer Model for Japanese

Comments

ModuleNotFoundError

Separate Estimation Model and DB

Releases(v0.1.1)

v0.1.1(May 29, 2021)

Owner

izuna385

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

Summarization module based on KoBART

DAGAN - Dual Attention GANs for Semantic Image Synthesis

PyTorch implementation of Tacotron speech synthesis model.

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

A list of NLP(Natural Language Processing) tutorials

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Code for text augmentation method leveraging large-scale language models

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

100+ Chinese Word Vectors 上百种预训练中文词向量

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

多语言降噪预训练模型MBart的中文生成任务

`el.link`

`el.question`