Predict the spans of toxic posts that were responsible for the toxic label of the posts

Last update: Jul 24, 2022

Overview

toxic-spans-detection

An attempt at the SemEval 2021 Task 5: Toxic Spans Detection.

The Toxic Spans Detection task of SemEval2021 required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. You can read the task overview paper for more information about the task, data, evaluation, and performance of the participating systems.

Installation

python3.8 -m pip install poetry

and then:

python3.8 -m poetry install

To serve the Jupyter notebooks, available under notebooks/, run:

python3.8 -m poetry run jupyter notebook

Evaluation

The team that has ranked first in this competition (HITSZ-HLT) has achieved an F1 score of 70.83% , with the median score across all 36 participating teams being 67.58%.

Our approach, as presented here, yields: 64.46%.

Owner

Ilias Antonopoulos

Machine Learning Engineer | MSc student

GitHub Repository

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

1.2k Jan 06, 2023

Predict the spans of toxic posts that were responsible for the toxic label of the posts

Related tags

Overview

toxic-spans-detection

Installation

Evaluation

Owner

Ilias Antonopoulos

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Yet Another Compiler Visualizer

Tool which allow you to detect and translate text.

Baseline code for Korean open domain question answering(ODQA)

code for modular summarization work published in ACL2021 by Krishna et al

⚖️ A Statutory Article Retrieval Dataset in French.

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Text classification on IMDB dataset using Keras and Bi-LSTM network

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

justCTF [*] 2020 challenges sources

A BERT-based reverse-dictionary of Korean proverbs

Textlesslib - Library for Textless Spoken Language Processing

Resources for "Natural Language Processing" Coursera course.

✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

2021 2학기 데이터크롤링 기말프로젝트

Train BPE with fastBPE, and load to Huggingface Tokenizer.

Code examples for my Write Better Python Code series on YouTube.

Chinese Grammatical Error Diagnosis