Implementation of legal QA system based on SentenceKoBART

Last update: Dec 27, 2022

Related tags

Text Data & NLP LegalQA

Overview

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina
Provide Korean legal QA data(1,830 pairs)

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

pods/encoder.yml - on_gpu: true

Search

With REST API

To start the Jina server for REST API:

python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"top_k": 1, "mode": "search",  "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'

Or use Jinabox with endpoint http://127.0.0.1:1234/api/search

From the terminal

python app.py -t query

Demo

http://ec2-3-36-123-253.ap-northeast-2.compute.amazonaws.com:7874/

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Implementation of legal QA system based on SentenceKoBART

Related tags

Overview

LegalQA using SentenceKoBART

Setup

Index

Search

With REST API

From the terminal

Demo

Citation

License

Owner

Heewon Jeon(gogamza)

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Nested Named Entity Recognition

Conditional probing: measuring usable information beyond a baseline

The tool to make NLP datasets ready to use

Chinese Named Entity Recognization (BiLSTM with PyTorch)

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

A natural language modeling framework based on PyTorch

تولید اسم های رندوم فینگیلیش

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Graph Coloring - Weighted Vertex Coloring Problem

Faster, modernized fork of the language identification tool langid.py

💫 Industrial-strength Natural Language Processing (NLP) in Python

Predict an emoji that is associated with a text

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Chinese Pre-Trained Language Models (CPM-LM) Version-I