APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Trained T5 and T5-large model for creating keywords from text

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Utilize Korean BERT model in sentence-transformers library

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

MRC approach for Aspect-based Sentiment Analysis (ABSA)

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Python generation script for BitBirds

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)