DistilBERT-Text-mining-authorship-attribution

dataset - Contains useful functions relating to the datasets.

feature_extraction_selection - Plots all models using best dataset and parameters. Used to compare feature extraction methods. (Code inspired by: https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)

dataset_selection - Prints out the average accuracy for every dataset.

ds_exploration - Used to print the shapes of each dataset and plot the class distributions.

baseline_process - Run gridsearch cv on all ml models (excluding BERT variants)

ml - Contains useful functions relating to the machine learning.

Project done for the TDDE16 - Text mining course at Linköpings university.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
baseline_process.py		baseline_process.py
bert.py		bert.py
dataset.py		dataset.py
dataset_selection.py		dataset_selection.py
ds_exploration.py		ds_exploration.py
feature_extraction_selection.py		feature_extraction_selection.py
ml.py		ml.py

Provide feedback