APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

A desktop GUI providing an audio interface for GPT3.

A framework for implementing federated learning

Chinese Named Entity Recognization (BiLSTM with PyTorch)

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Translation to python of Chris Sims' optimization function

Knowledge Management for Humans using Machine Learning & Tags

Prompt tuning toolkit for GPT-2 and GPT-Neo

Official PyTorch implementation of SegFormer

Augmenty is an augmentation library based on spaCy for augmenting texts.

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Natural Language Processing Specialization

A Facebook Messenger Chatbot using NLP

Utilize Korean BERT model in sentence-transformers library

A Transformer Implementation that is easy to understand and customizable.

A repo for materials relating to the tutorial of CS-332 NLP

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers