Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Last update: Oct 18, 2022

Related tags

Overview

The KLEJ Benchmark Baselines

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding.

This repository contains example scripts to easily fine-tune models from the transformers library on the KLEJ benchmark.

Installation

Install the Python package using the following commands:

$ git clone https://github.com/allegro/klejbenchmark-baselines
$ pip install klejbenchmark-baselines/

Quick Start

To fine-tune your model on KLEJ tasks using the default settings, you can use the provided example scripts.

First, download the KLEJ benchmark datasets:

$ bash scripts/download_klej.sh

After downloading KLEJ, customize training parameters inside the scripts/run_training.sh script and train the models using:

$ bash scripts/run_training.sh

It will create:

Tensorboard logs with training and validation metrics,
checkpoints of the best models,
a zip file with predictions for the test sets, which is a valid submission for the KLEJ benchmark.

The zip file can be submitted at the klejbenchmark.com website for the evaluation on the test sets.

Custom Training

It's also possible to train each model separately and customize the training parameters using the klejbenchmark_baselines/main.py script.

License

Apache 2 License

Citation

If you use this code, please cite the following paper:

@inproceedings{rybak-etal-2020-klej,
    title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
    author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.111",
    pages = "1191--1201",
}

Authors

This code was created by the Allegro Machine Learning Research team.

You can contact us at: [email protected]

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

Owner

Allegro Tech

KR-FinBert And KR-FinBert-SC

Host your own GPT-3 Discord bot

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

BERT-based Financial Question Answering System

Super easy library for BERT based NLP models

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Yet Another Neural Machine Translation Toolkit

Findings of ACL 2021

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Almost State-of-the-art Text Generation library

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

ACL'2021: Learning Dense Representations of Phrases at Scale

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".