초성 해석기 based on ko-BART

Last update: Oct 28, 2022

Related tags

Overview

초성 해석기

개요

한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다.

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ
예측 문장: 나는 너를 좋아해

모델

모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다.

데이터

문장 단위로 이루어진 아무 코퍼스나 사용가능합니다. 단, 모델의 추론 성능은 데이터의 도메인이나 데이터의 양에 크게 의존하기 때문에 원하는 모델 성능에 맞는 코퍼스를 사용해주세요. ./data 디렉토리에 더미 데이터셋을 추가해두었으니, 더미 데이터셋과 동일한 형식의 코퍼스를 준비해두시면 됩니다.

학습

python run_train.py

추론

python run_inference.py --finetuned-model-path $FINETUNED_MODEL_PATH

예시

공개된 코퍼스로 학습한 모델의 추론 결과입니다.

초성: ㅂㄱㅍㄷ 	 예측 문장: 배고픈데
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프다
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프대

초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑해요
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑했어
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 나만너무 사랑해요

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 나는 너를 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 누나 나랑 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 너는 나를 좋아해

Notes

본 레포는 별도의 학습 데이터를 포함하고 있지 않습니다.
본 레포의 라이센스는 Ko-BART의 modified-MIT 라이센스를 따릅니다.

Todo

테스트 코드 추가

초성 해석기 based on ko-BART

Related tags

Overview

초성 해석기

개요

모델

데이터

학습

추론

예시

Notes

Todo

Owner

Dawoon Jung

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

原神抽卡记录数据集-Genshin Impact gacha data

A python package for deep multilingual punctuation prediction.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Bnagla hand written document digiiztion

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

A NLP program: tokenize method, PoS Tagging with deep learning

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Contract Understanding Atticus Dataset

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

MEDIALpy: MEDIcal Abbreviations Lookup in Python

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

An open collection of annotated voices in Japanese language

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Count the frequency of letters or words in a text file and show a graph.