Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Overview

Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker

Earlier this year we announced a strategic collaboration with Amazon to make it easier for companies to use Hugging Face Transformers in Amazon SageMaker, and ship cutting-edge Machine Learning features faster. We introduced new Hugging Face Deep Learning Containers (DLCs) to train and deploy Hugging Face Transformers in Amazon SageMaker.

In addition to the Hugging Face Inference DLCs, we created a Hugging Face Inference Toolkit for SageMaker. This Inference Toolkit leverages the pipelines from the transformers library to allow zero-code deployments of models, without requiring any code for pre-or post-processing.

In October and November, we held a workshop series on “Enterprise-Scale NLP with Hugging Face & Amazon SageMaker”. This workshop series consisted out of 3 parts and covers:

  • Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it
  • Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker
  • MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

We recorded all of them so you are now able to do the whole workshop series on your own to enhance your Hugging Face Transformers skills with Amazon SageMaker or vice-versa.

Below you can find all the details of each workshop and how to get started.

🧑🏻‍💻 Github Repository: https://github.com/philschmid/huggingface-sagemaker-workshop-series

📺   Youtube Playlist: https://www.youtube.com/playlist?list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ

Note: The Repository contains instructions on how to access a temporary AWS, which was available during the workshops. To be able to do the workshop now you need to use your own or your company AWS Account.

In Addition to the workshop we created a fully dedicated Documentation for Hugging Face and Amazon SageMaker, which includes all the necessary information. If the workshop is not enough for you we also have 15 additional getting samples Notebook Github repository, which cover topics like distributed training or leveraging Spot Instances.

Workshop 1: Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it

In Workshop 1 you will learn how to use Amazon SageMaker to train a Hugging Face Transformer model and deploy it afterwards.

  • Prepare and upload a test dataset to S3
  • Prepare a fine-tuning script to be used with Amazon SageMaker Training jobs
  • Launch a training job and store the trained model into S3
  • Deploy the model after successful training

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_1_getting_started_with_amazon_sagemaker

📺  Youtube: https://www.youtube.com/watch?v=pYqjCzoyWyo&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=5s&ab_channel=HuggingFace

Workshop 2: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker

In Workshop 2 learn how to use Amazon SageMaker to deploy, scale & monitor your Hugging Face Transformer models for production workloads.

  • Run Batch Prediction on JSON files using a Batch Transform
  • Deploy a model from hf.co/models to Amazon SageMaker and run predictions
  • Configure autoscaling for the deployed model
  • Monitor the model to see avg. request time and set up alarms

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_2_going_production

📺  Youtube: https://www.youtube.com/watch?v=whwlIEITXoY&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=61s

Workshop 3: MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

In Workshop 3 learn how to build an End-to-End MLOps Pipeline for Hugging Face Transformers from training to production using Amazon SageMaker.

We are going to create an automated SageMaker Pipeline which:

  • processes a dataset and uploads it to s3
  • fine-tunes a Hugging Face Transformer model with the processed dataset
  • evaluates the model against an evaluation set
  • deploys the model if it performed better than a certain threshold

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_3_mlops

📺  Youtube: https://www.youtube.com/watch?v=XGyt8gGwbY0&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=7

Access Workshop AWS Account

For this workshop you’ll get access to a temporary AWS Account already pre-configured with Amazon SageMaker Notebook Instances. Follow the steps in this section to login to your AWS Account and download the workshop material.

1. To get started navigate to - https://dashboard.eventengine.run/login

setup1

Click on Accept Terms & Login

2. Click on Email One-Time OTP (Allow for up to 2 mins to receive the passcode)

setup2

3. Provide your email address

setup3

4. Enter your OTP code

setup4

5. Click on AWS Console

setup5

6. Click on Open AWS Console

setup6

7. In the AWS Console click on Amazon SageMaker

setup7

8. Click on Notebook and then on Notebook instances

setup8

9. Create a new Notebook instance

setup9

10. Configure Notebook instances

  • Make sure to increase the Volume Size of the Notebook if you want to work with big models and datasets
  • Add your IAM_Role with permissions to run your SageMaker Training And Inference Jobs
  • Add the Workshop Github Repository to the Notebook to preload the notebooks: https://github.com/philschmid/huggingface-sagemaker-workshop-series.git

setup10

11. Open the Lab and select the right kernel you want to do and have fun!

Open the workshop you want to do (workshop_1_getting_started_with_amazon_sagemaker/) and select the pytorch kernel

setup11

Owner
Philipp Schmid
Machine Learning Engineer & Tech Lead at Hugging Face👨🏻‍💻 🤗 Cloud enthusiast ☁️ AWS ML HERO 🦸🏻‍♂️ Nuremberg 🇩🇪
Philipp Schmid
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

44 Jan 06, 2023
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 39 Dec 14, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 01, 2023
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Visual Automata Copyright 2021 Lewi Lie Uberg Released under the MIT license Visual Automata is a Python 3 library built as a wrapper for Caleb Evans'

Lewi Uberg 55 Nov 17, 2022
Pretrained Japanese BERT models

Pretrained Japanese BERT models This is a repository of pretrained Japanese BERT models. The models are available in Transformers by Hugging Face. Mod

Inui Laboratory 387 Dec 30, 2022
HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

Vaibhav Srivastav 110 Jan 01, 2023
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022
HuggingTweets - Train a model to generate tweets

HuggingTweets - Train a model to generate tweets Create in 5 minutes a tweet generator based on your favorite Tweeter Make my own model with the demo

Boris Dayma 318 Jan 04, 2023
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022
nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

xuming 3 May 29, 2022
A demo of chinese asr

chinese_asr_demo 一个端到端的中文语音识别模型训练、测试框架 具备数据预处理、模型训练、解码、计算wer等等功能 训练数据 训练数据采用thchs_30,

4 Dec 09, 2021
CoSENT、STS、SentenceBERT

CoSENT_Pytorch 比Sentence-BERT更有效的句向量方案

102 Dec 07, 2022
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

TextFlint 587 Dec 20, 2022
用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀 本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

冯泉荔 2.2k Jan 03, 2023
Creating an LSTM model to generate music

Music-Generation Creating an LSTM model to generate music music-generator Used to create basic sin wave sounds music-ai Contains the functions to conv

Jerin Joseph 2 Dec 02, 2021
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Wordle_Bot Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time. It will log onto the wordle website and en

Lucas Polidori 15 Dec 11, 2022
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022