This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

Overview

slue-toolkit

made-with-python License: MIT

We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks. Refer https://arxiv.org/abs/2111.10367 for more details.

News

  • Nov. 22: We release the SLUE paper on arXiv along with the slue-toolkit repository. The repository contains data processing and evaluation scripts. We will publish the scripts for trainig the baseline models soon.

Installation

  1. git clone this repository and install slue-toolkit (development mode)
git clone https://github.com/asappresearch/slue-toolkit.git
pip install -e .

or install directly from Github

pip install git+https://github.com/asappresearch/slue-toolkit.git
  1. Install additional dependency based on your choice (e.g. you need fairseq and transformers for baselines)

SLUE Tasks

Automatic Speech Recognition (ASR)

Although this is not a SLU task, ASR can help analyze performance of downstream SLU tasks on the same domain. Additionally, pipeline approaches depend on ASR outputs, making ASR relevant to SLU. ASR is evaluated using word error rate (WER).

Named Entity Recognition (NER)

Named entity recognition involves detecting the named entities and their tags (types) in a given sentence. We evaluate performance using micro-averaged F1 and label-F1 scores. The F1 score evaluates an unordered list of named entity phrase and tag pairs predicted for each sentence. Only the tag predictions are considered for label-F1.

Sentiment Analysis (SA)

Sentiment analysis refers to classifying a given speech segment as having negative, neutral, or positive sentiment. We evaluate SA using macro-averaged (unweighted) recall and F1 scores.

Datasets

Corpus Size - utts (hours) Tasks License
Fine-tune Dev Test
SLUE-VoxPopuli 5,000 (14.5) 1,753 (5.0) 1,842 (4.9) ASR, NER CC0 (check complete license here)
SLUE-VoxCeleb 5,777 (12.8) 955 (2.1) 4,052 (9.0) ASR, SA CC-BY 4.0 (check complete license here)

For SLUE, you need VoxCeleb and VoxPopuli dataset. We carefully curated subset of those dataset for fine-tuning and evaluation for SLUE tasks, and we re-distribute the the subsets. Thus, you don't need to download a whole gigantic datasets. In the dataset, we also includes the human annotation and transcription for SLUE tasks. All you need to do is just running the script below and it will download and pre-process the dataset.

Download and pre-process dataset

bash scripts/download_datasets.sh

SLUE score evaluation

The test set data and annotation will be used for the official SLUE score evaluation, however we will not release the test set annotation. Thus, the SLUE score can be evaluated by submitting your prediction result in tsv format. We will prepare the website to accept your submission. Please stay tuned for this.

Model development rule

To train model, You can use fine-tuning and dev sets (audio, transcription and annotation) except the test set of SLUE task. Additionally you can use any kind of external dataset whether it is labeled or unlabeled for any purpose of training (e.g. pre-training and fine-tuning).

For vadidation of your model, you can use official dev set we provide, or you can make your own splits or cross-validation splits by mixing fine-tuning and dev set all together.

Baselines

ASR

Fine-tuning

Assuming that the preprocessed manifest files are in manifest/slue-voxceleb and manifest/slue-voxpopuli for SLUE-VoxCeleb and SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model on these two datasets using one GPU.

bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxceleb save/asr/w2v2-base-vc
bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxpopuli save/asr/w2v2-base-vp

Evaluation

To evaluate the fine-tuned wav2vec 2.0 ASR models on the dev set, please run the following commands.

python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vc --data manifest/slue-voxceleb --subset dev
python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vp --data manifest/slue-voxpopuli --subset dev

The WER will be printed directly. The predictions are saved in save/asr/w2v2-base-vc/pred-dev.wrd and save/asr/w2v2-base-vp/pred-dev.wrd and can be used for pipeline models.

More detail baseline experiment described here

NER

Fine-tuning End-to-end model

Assuming that the preprocessed manifest files are in manifest/slue-voxpopuli for SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model using one GPU.

bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base

Evaluating End-to-End model

To evaluate the fine-tuned wav2vec 2.0 E2E NER model on the dev set, please run the following command. (decoding without language model)

bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base dev combined nolm

More detail baseline experiment described here

Sentiment Analysis

Fine-tuning

This command fine-tune a wav2vec 2.0 base model on the voxceleb dataset

bash baselines/sentiment/ft-w2v2-base-senti.sh manifest/slue-voxceleb save/sentiment/w2v2-base

Evaluation

To evaluate the fine-tuned wav2vec 2.0 sentiment model, run following commands or run baselines/sentiment/e2e_scripts/eval.sh

python3 slue_toolkit/eval/eval_w2v_sentiment.py --save-dir save/sentiment/w2v2-base --data manifest/slue-voxceleb --subset dev

More detail baseline experiment described here

Owner
ASAPP Research
AI for Enterprise
ASAPP Research
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

Nicolas Girard 186 Jan 04, 2023
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 09, 2023
BuildingNet: Learning to Label 3D Buildings

BuildingNet This is the implementation of the BuildingNet architecture described in this paper: Paper: BuildingNet: Learning to Label 3D Buildings Arx

16 Nov 07, 2022
Totally Versatile Miscellanea for Pytorch

Totally Versatile Miscellania for PyTorch Thomas Viehmann [email protected] Thi

Thomas Viehmann 428 Dec 28, 2022
Intro-to-dl - Resources for "Introduction to Deep Learning" course.

Introduction to Deep Learning course resources https://www.coursera.org/learn/intro-to-deep-learning Running on Google Colab (tested for all weeks) Go

Advanced Machine Learning specialisation by HSE 761 Dec 24, 2022
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

1 Jun 14, 2022
3D Generative Adversarial Network

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling This repository contains pre-trained models and sampling

Chengkai Zhang 791 Dec 20, 2022
A framework for multi-step probabilistic time-series/demand forecasting models

JointDemandForecasting.py A framework for multi-step probabilistic time-series/demand forecasting models File stucture JointDemandForecasting contains

Stanford Intelligent Systems Laboratory 3 Sep 28, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking PoseRBPF Paper Self-supervision Paper Pose Estimation Video Robot Manipulati

NVIDIA Research Projects 107 Dec 25, 2022
Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network The performances of tree ensemb

Mustapha Unubi Momoh 2 Sep 13, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

Neel 10 Nov 02, 2022
Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022
Self-Regulated Learning for Egocentric Video Activity Anticipation

Self-Regulated Learning for Egocentric Video Activity Anticipation Introduction This is a Pytorch implementation of the model described in our paper:

qzhb 13 Sep 23, 2022
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
Split Variational AutoEncoder

Split-VAE Split Variational AutoEncoder Introduction This repository contains and implemementation of a Split Variational AutoEncoder (SVAE). In a SVA

Andrea Asperti 2 Sep 02, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 04, 2023