Code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language"

Overview

The repository provides the source code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language" submitted to HASOC 2021 English Subtask 1A.

Publication

Arxiv

Installation (requires >=Python 3.6 )

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
download the 'resources.zip' file here: https://drive.google.com/file/d/1X88cMrLVpAcJd5Z4Gg6MfTLclIuGF-d6/view?usp=sharing
extract the content of 'resources.zip'

Training and Evaluation on HASOC datasets (2019, 2020, 2021)

Execute the following command to train and evaluate the model. The evaluation results are saved under the folder 'results'.

python main.py -c config.json

Optimizing Hyperparameters

The "config.json" file contains hyperparameters that can be changed to train different variants of the model.

{
  "base_dir": "",
  "batch_size": 64,
  "epochs": 20,
  "epoch_patience": 5,
  "bert_model_dir": "resources/hatebert",
  "monitor": "loss",
  "tweet_text_seq_len": 80,
  "tweet_text_char_len": 128,
  "char_size": 29,
  "max_learning_rate": 0.001,
  "end_learning_rate": 0.0000001,
  "rnn_type": "lstm",
  "rnn_layer_size": 200,
  "text_models": ["char_emb", "bert", "hate_words"],
  "normalize_text": true,
  "dataset_year": "2021",
  "optimizer": "adam",
  "text_use_attention": false,
  "oversample": true,
  "feature_normalization_layer_size": 512,
  "min_feature_normalization_layer_size": 64
}

bert_model_dir

"bert_model_dir": "resources/hatebert"
     OR
"bert_model_dir": "resources/bert-base"

dataset_year

"dataset_year": "2019"
	OR
"dataset_year": "2020"
	OR
"dataset_year": "2021"

text_models

"text_models": ["hate_words"]
	OR
"text_models": ["bert"]
	OR
"text_models": ["char_emb"]
	OR
"text_models": ["char_emb", "bert", "hate_words"]

rnn_type

"rnn_type": "lstm"
	OR
"rnn_type": "gru"
	OR
"rnn_type": "bi-gru"
Owner
Sherzod Hakimov
Sherzod Hakimov
gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

gACSON gACSON software is to visualize, segment, and analyze the morphology of neurons in three-dimensional electron microscopy images. If you use any

Andrea Behanova 2 May 31, 2022
(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats & License PyOD is a comprehensive and scalable Python toolkit for detecting outlyin

Yue Zhao 6.6k Jan 05, 2023
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
Loopy belief propagation for factor graphs on discrete variables, in JAX!

PGMax implements general factor graphs for discrete probabilistic graphical models (PGMs), and hardware-accelerated differentiable loopy belief propagation (LBP) in JAX.

Vicarious 62 Dec 23, 2022
网络协议2天集训

网络协议2天集训 抓包工具安装 Wireshark wireshark下载地址 Tcpdump CentOS yum install tcpdump -y Ubuntu apt-get install tcpdump -y k8s抓包测试环境 查看虚拟网卡veth pair 查看

120 Dec 12, 2022
Alphabetical Letter Recognition

DecisionTrees-Image-Classification Alphabetical Letter Recognition In these demo we are using "Decision Trees" Our database is composed by Learning Im

Mohammed Firass 4 Nov 30, 2021
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
Implements an infinite sum of poisson-weighted convolutions

An infinite sum of Poisson-weighted convolutions Kyle Cranmer, Aug 2018 If viewing on GitHub, this looks better with nbviewer: click here Consider a v

Kyle Cranmer 26 Dec 07, 2022
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Antoine Caillon 589 Jan 02, 2023
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022
Official repository for "Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring".

RNN-MBP Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring (AAAI-2022) by Chao Zhu, Hang Dong, Jinshan Pan

SIV-LAB 22 Aug 31, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

Решения и подсказки к тренировке по алгоритмам от Яндекса Что есть внутри Решения с подсказками и комментариями; рекомендую сначала смотреть md файл п

Yankovsky Andrey 50 Dec 26, 2022
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision Project | PDF | Poster Fangyu Li, N. Dinesh Reddy, X

25 Dec 21, 2022
Generalized hybrid model for mode-locked laser diodes with an extended passive cavity

GenHybridMLLmodel Generalized hybrid model for mode-locked laser diodes with an extended passive cavity This hybrid simulation strategy combines a tra

Stijn Cuyvers 3 Sep 21, 2022
Pytorch Implementation for Dilated Continuous Random Field

DilatedCRF Pytorch implementation for fully-learnable DilatedCRF. If you find my work helpful, please consider our paper: @article{Mo2022dilatedcrf,

DunnoCoding_Plus 3 Nov 13, 2022
DeepRec is a recommendation engine based on TensorFlow.

DeepRec Introduction DeepRec is a recommendation engine based on TensorFlow 1.15, Intel-TensorFlow and NVIDIA-TensorFlow. Background Sparse model is a

Alibaba 676 Jan 03, 2023
Source code of the paper PatchGraph: In-hand tactile tracking with learned surface normals.

PatchGraph This repository contains the source code of the paper PatchGraph: In-hand tactile tracking with learned surface normals. Installation Creat

Paloma Sodhi 11 Dec 15, 2022