KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Last update: Dec 13, 2022

Related tags

Overview

KLUE Baseline

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper for more details about KLUE and the baselines.

Dependencies

Make sure you have installed the packages listed in requirements.txt.

pip install -r requirements.txt

All expereiments are tested under Python 3.7 environment.

KLUE Benchmark Datasets

All train/dev sets of KLUE tasks are publicly available in this repo. You can access them by using git submodules. To clone the repo with datasets:

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git

or just download datasets after cloned this repo:

git submodule update --init --recursive

All test sets are not publicly available. To measure performance of your model on test set, you should first train your model on train set and submit the model to our submission system. Alternatively, you can compare dev set performances with our baseline models. They are also reported in our paper.

Train

To reproduce our baselines, run run_all.sh.

NOTE: klue/roberta models accept input length at most 510 tokens. Details are explained here.

Reference

If you use this code or KLUE, please cite:

@misc{park2021klue,
      title={KLUE: Korean Language Understanding Evaluation}, 
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jung-Woo Ha and Kyunghyun Cho},
      year={2021},
      eprint={2105.09680},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contribution

Feel free to leave issues if there are any questions or comments. To contribute, please run make style before creating pull requests.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Related tags

Overview

KLUE Baseline

Dependencies

KLUE Benchmark Datasets

Train

Reference

Contribution

Owner

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Understand Text Summarization and create your own summarizer in python

Klexikon: A German Dataset for Joint Summarization and Simplification

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Random Directed Acyclic Graph Generator

Subtitle Workshop (subshop): tools to download and synchronize subtitles

Fixes mojibake and other glitches in Unicode text, after the fact.

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

The first online catalogue for Arabic NLP datasets.

Big Bird: Transformers for Longer Sequences

Open-source offline translation library written in Python. Uses OpenNMT for translations

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Pipelines de datos, 2021.

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Turn clang-tidy warnings and fixes to comments in your pull request

Library for Russian imprecise rhymes generation

UniSpeech - Large Scale Self-Supervised Learning for Speech

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP