Use fastai-v2 with HuggingFace's pretrained transformers

Last update: Nov 16, 2022

Related tags

Text Data & NLP fasthugs

Overview

FastHugs

Use fastai v2 with HuggingFace's pretrained transformers, see the notebooks below depending on your task:

Text classification: fasthugs_seq_classification.ipynb
Language model pre-training or fine-tuning (RoBERTa only for now): fasthugs_language_model.ipynb

What's New

April 24, 2020

Added fasthugs_language_model.ipynb which shows you how to pre-train or fine-tune a Masked Language Model (MLM), RoBERTa in this case, from scratch

April 17, 2020

Added new get_vocab functionality from HuggingFace, unified api to extract a tokenizer's vocab
Added new AutoModelForSequenceClassification, AutoConfig, AutoModelForSequenceClassification HuggingFace functionality to make things tider
Tidied up and refactored FastHugsTokenizer and FastHugsModel
OLD demo and vocab files to be deleted soon

Things You Might Like ( ❤️ ?)

FastHugsTokenizer: A tokenizer wrapper than can be used with fastai-v2’s tokenizer.

FastHugsModel: A model wrapper over the HF models, more or less the same to the wrapper’s from HF fastai-v1 articles mentioned below

Padding: Padding settings for the padding token index and on whether the transformer prefers left or right padding

Model Splitters: Functions to split the classification head from the model backbone in line with fastai-v2’s new definition of Learner (splitters)

Read these first 👇

This notebook heavily borrows from this notebook , which in turn is based off of this tutorial and accompanying article. Huge thanks to Melissa Rajaram and Maximilien Roberti for these great resources, if you're not familiar with the HuggingFace library please given them a read first as they are quite comprehensive.

fastai-v2 ✌️ 2️⃣

This paper introduces the v2 version of the fastai library and you can follow and contribute to v2's progress on the forums. This notebook uses the small IMDB dataset and is based off the fastai-v2 ULMFiT tutorial. Huge thanks to Jeremy, Sylvain, Rachel and the fastai community for making this library what it is. I'm super excited about the additinal flexibility v2 brings. 🎉

Use fastai-v2 with HuggingFace's pretrained transformers

Related tags

Overview

FastHugs

What's New

April 24, 2020

April 17, 2020

Things You Might Like ( ❤️ ?)

Read these first 👇

fastai-v2 ✌️ 2️⃣

Owner

Morgan McGuire

Opal-lang - A WIP programming language based on Python

A simple version of DeTR

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

中文空间语义理解评测

本插件是pcrjjc插件的重置版，可以独立于后端api运行

2021语言与智能技术竞赛：机器阅读理解任务

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Package for controllable summarization

🏆 • 5050 most frequent words in 109 languages

The aim of this task is to predict someone's English proficiency based on a text input.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Awesome Treasure of Transformers Models Collection

AI and Machine Learning workflows on Anthos Bare Metal.

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search