Open solution to the Toxic Comment Classification Challenge

Last update: Jun 22, 2022

Overview

Starter code: Kaggle Toxic Comment Classification Challenge

More competitions 🎇

Check collection of public projects 🎁 , where you can find multiple Kaggle competitions with code, experiments and outputs.

Here, at Neptune we enjoy participating in the Kaggle competitions. Toxic Comment Classification Challenge is especially interesting because it touches important issue of online harassment.

Ensemble our predictions in the cloud!

You need to be registered to neptune.ml to be able to use our predictions for your ensemble models.

click start notebook
choose browse button
select the neptune_ensembling.ipynb file from this repository.
choose worker type: gcp-large is the recommended one.
run first few cells to load our predictions on the held out validation set along with the labels
grid search over many possible parameter options. The more runs you choose the longer it will run.
train your second level, ensemble model (it should take less than an hour once you have the parameters)
load our predictions on the test set
feed our test set predictions to your ensemble model and get final predictions
save your submission file
click on browse files and find your submission file to download it.

Running the notebook as is got 0.986+ on the LB.

Disclaimer

In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 😉 .

The idea

We are contributing starter code that is easy to use and extend. We did it before with Cdiscount’s Image Classification Challenge and we believe that it is correct way to open data science to the wider community and encourage more people to participate in Challenges. This starter is ready-to-use end-to-end solution. Since all computations are organized in separate steps, it is also easy to extend. Check devbook.ipynb for more information about different pipelines.

Now we want to go one step further and invite you to participate in the development of this analysis pipeline. At the later stage of the competition (early February) we will invite top contributors to join our team on Kaggle.

Contributing

You are welcome to extend this pipeline and contribute your own models or procedures. Please refer to the CONTRIBUTING for more details.

Installation

option 1: Neptune cloud

on the neptune site

log in: neptune accound login
create new project named toxic: Follow the link Projects (top bar, left side), then click New project button. This action will generate project-key TOX, which is already listed in the neptune.yaml.

run setup commands

$ git clone https://github.com/neptune-ml/kaggle-toxic-starter.git
$ pip3 install neptune-cli
$ neptune login

start experiment

$ neptune send --environment keras-2.0-gpu-py3 --worker gcp-gpu-medium --config best_configs/fasttext_gru.yaml -- train_evaluate_predict_cv_pipeline --pipeline_name fasttext_gru --model_level first

This should get you to 0.9852 Happy Training :)

Refer to Neptune documentation and Getting started: Neptune Cloud for more.

option 2: local install

Please refer to the Getting started: local instance for installation procedure.

Solution visualization

Below end-to-end pipeline is visualized. You can run exactly this one!

We have also prepared something simpler to just get you started:

User support

There are several ways to seek help:

Read project's Wiki, where we publish descriptions about the code, pipelines and neptune.
Kaggle discussion is our primary way of communication.
You can submit an issue directly in this repo.

Open solution to the Toxic Comment Classification Challenge

Related tags

Overview

Starter code: Kaggle Toxic Comment Classification Challenge

More competitions 🎇

Ensemble our predictions in the cloud!

Disclaimer

The idea

Contributing

Installation

option 1: Neptune cloud

option 2: local install

Solution visualization

User support

Owner

minerva.ml

Bnagla hand written document digiiztion

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Converts text into a PDF of handwritten notes

Repository of the Code to Chatbots, developed in Python

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

Collection of useful (to me) python scripts for interacting with napari

Experiments in converting wikidata to ftm

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Official implementation of Meta-StyleSpeech and StyleSpeech

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Implementation of Multistream Transformers in Pytorch