Long text token classification using LongFormer

Last update: Aug 07, 2022

Overview

Long text token classification using LongFormer

The data comes from: https://www.kaggle.com/c/feedback-prize-2021/

To train the model for 5 folds, you can run:

python train.py --fold 0 --model allenai/longformer-large-4096 --lr 1e-5 --epochs 10 --max_len 1536 --batch_size 4 --valid_batch_size 4
python train.py --fold 1 --model allenai/longformer-large-4096 --lr 1e-5 --epochs 10 --max_len 1536 --batch_size 4 --valid_batch_size 4
python train.py --fold 2 --model allenai/longformer-large-4096 --lr 1e-5 --epochs 10 --max_len 1536 --batch_size 4 --valid_batch_size 4
python train.py --fold 3 --model allenai/longformer-large-4096 --lr 1e-5 --epochs 10 --max_len 1536 --batch_size 4 --valid_batch_size 4
python train.py --fold 4 --model allenai/longformer-large-4096 --lr 1e-5 --epochs 10 --max_len 1536 --batch_size 4 --valid_batch_size 4

Note that you need have kfold column in training data.

Owner

abhishek thakur

Kaggle: www.kaggle.com/abhishek

GitHub Repository

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

3 Aug 19, 2022

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 04, 2022

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

2 May 14, 2022

Awesome-NLP-Research (ANLP)

72 Dec 19, 2022

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

17 Dec 19, 2022

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

9 Jan 08, 2023

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Stanford CoreNLP Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and giv

8.8k Jan 07, 2023

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

48 Jan 02, 2023

Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

13 Dec 21, 2022

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

1 Aug 21, 2022

Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

644 Jan 01, 2023

Weird Sort-and-Compress Thing

Weird Sort-and-Compress Thing A weird integer sorting + compression algorithm inspired by a conversation with Luthingx (it probably already exists by

1 Jan 03, 2022

Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Highlights The strongest performances Tracker

485 Jan 04, 2023

Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS 🐸 green green gradio app.py false Ukrainian TTS 📢 🤖 Ukrainian TTS (text-to-speech)

85 Dec 26, 2022

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

3.1k Jan 07, 2023

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

11 Feb 18, 2022

Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

1.2k Jan 09, 2023

Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

3.2k Dec 30, 2022

Long text token classification using LongFormer

Related tags

Overview

Long text token classification using LongFormer

Owner

abhishek thakur

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

The Sudachi synonym dictionary in Solar format.

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Awesome-NLP-Research (ANLP)

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Stanford CoreNLP provides a set of natural language analysis tools written in Java

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Document processing using transformers

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Client library to download and publish models and other files on the huggingface.co hub

Weird Sort-and-Compress Thing

Learning Spatio-Temporal Transformer for Visual Tracking

Ukrainian TTS (text-to-speech) using Coqui TTS

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Huggingface Transformers + Adapters = ❤️

Data loaders and abstractions for text and NLP