The repository provides the source code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language" submitted to HASOC 2021 English Subtask 1A.
Arxiv: https://arxiv.org/pdf/2112.04803.pdf
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
download the 'resources.zip' file here: https://drive.google.com/file/d/1X88cMrLVpAcJd5Z4Gg6MfTLclIuGF-d6/view?usp=sharing
extract the content of 'resources.zip'
Execute the following command to train and evaluate the model. The evaluation results are saved under the folder 'results'.
python main.py -c config.json
The "config.json" file contains hyperparameters that can be changed to train different variants of the model.
{
"base_dir": "",
"batch_size": 64,
"epochs": 20,
"epoch_patience": 5,
"bert_model_dir": "resources/hatebert",
"monitor": "loss",
"tweet_text_seq_len": 80,
"tweet_text_char_len": 128,
"char_size": 29,
"max_learning_rate": 0.001,
"end_learning_rate": 0.0000001,
"rnn_type": "lstm",
"rnn_layer_size": 200,
"text_models": ["char_emb", "bert", "hate_words"],
"normalize_text": true,
"dataset_year": "2021",
"optimizer": "adam",
"text_use_attention": false,
"oversample": true,
"feature_normalization_layer_size": 512,
"min_feature_normalization_layer_size": 64
}
bert_model_dir
"bert_model_dir": "resources/hatebert"
OR
"bert_model_dir": "resources/bert-base"
dataset_year
"dataset_year": "2019"
OR
"dataset_year": "2020"
OR
"dataset_year": "2021"
text_models
"text_models": ["hate_words"]
OR
"text_models": ["bert"]
OR
"text_models": ["char_emb"]
OR
"text_models": ["char_emb", "bert", "hate_words"]
rnn_type
"rnn_type": "lstm"
OR
"rnn_type": "gru"
OR
"rnn_type": "bi-gru"