Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Last update: Dec 09, 2022

Related tags

Overview

VAD-SLI-ASR

Python scripts for a speech processing pipeline with Voice Activity Detection (VAD), Spoken Language Identification (SLI), and Automatic Speech Recognition (ASR). Our use case involves using VAD to detect time regions in a language documentation recording where someone is speaking, then using SLI to classify each region as either English (eng) or Muruwari (zmu), and then using an English ASR model to transcribe regions detected as English. This pipeline outputs an ELAN .eaf file with the following tier structure (_vad, _sli, and _asr):

Set up

pip install -r requirements.txt

Data

├── data
│   ├── sli-train      <- Training data for SLI (one folder per language)
│   │   ├── eng/       <- .wav files (English utterances)
│   │   ├── zmu/       <- .wav files (Muruwari utterances)
│   ├── asr-train      <- Intermediate data that has been transformed.
│   │   ├── eng.tsv    <- transcriptions
│   │   ├── eng/       <- .wav files (English utterances)

Usage

VAD

# VAD
python scripts/run_vad-by-silero.py myrecording.wav

SLI

# To train a classifier using your own clips and then save it:
python scripts/train_sli-by-sblr.py data/sli-train models/zmu-eng_sli_k10.pkl

# Use trained model to classify VAD-detected regions as eng or zmu
python scripts/run_sli-by-sblr.py models/zmu-eng_sli_k10.pkl myrecording.wav

ASR

# To fine-tune a wav2vec 2.0 model and save the checkpoint:
python scripts/train_asr-by-w2v2.py data/asr-train data/checkpoints/no-lm_b10

# Transcribe using trained model 
python scripts/run_asr-by-w2v2.py data/checkpoints/no-lm_b10 myrecording.wav

Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

3 Jan 5, 2022

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Releases(1.1.0)

1.1.0(Apr 23, 2022)
Switched to using pre-existing vocabulary from pre-trained model (see Appendix A in paper).

Source code(tar.gz)
Source code(zip)
1.0.0(Apr 18, 2022)

Source code(tar.gz)
Source code(zip)
0.9.0(Apr 14, 2022)

Pre-release to check Zenodo sync
Source code(tar.gz)
Source code(zip)

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Related tags

Overview

VAD-SLI-ASR

Set up

Data

Usage

VAD

SLI

ASR

You might also like...

Paddlespeech Streaming ASR GUI

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project converts your human voice input to its text transcript and to an automated voice too.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Every Google, Azure & IBM text to speech voice for free

Releases(1.1.0)

1.1.0(Apr 23, 2022)

1.0.0(Apr 18, 2022)

0.9.0(Apr 14, 2022)

Owner

Dynamics of Language

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

A retro text-to-speech bot for Discord

ACL'22: Structured Pruning Learns Compact and Accurate Models

🤖 Basic Financial Chatbot with handoff ability built with Rasa

SpikeX - SpaCy Pipes for Knowledge Extraction

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

🌐 Translation microservice powered by AI

A fast and easy implementation of Transformer with PyTorch.

Pretrained Japanese BERT models

Knowledge Management for Humans using Machine Learning & Tags

📝An easy-to-use package to restore punctuation of the text.

Python wrapper for Stanford CoreNLP tools v3.4.1

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Implementation of "Adversarial purification with Score-based generative models", ICML 2021

NLP applications using deep learning.

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer