Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Last update: Dec 09, 2022

Related tags

Overview

VAD-SLI-ASR

Python scripts for a speech processing pipeline with Voice Activity Detection (VAD), Spoken Language Identification (SLI), and Automatic Speech Recognition (ASR). Our use case involves using VAD to detect time regions in a language documentation recording where someone is speaking, then using SLI to classify each region as either English (eng) or Muruwari (zmu), and then using an English ASR model to transcribe regions detected as English. This pipeline outputs an ELAN .eaf file with the following tier structure (_vad, _sli, and _asr):

Set up

pip install -r requirements.txt

Data

├── data
│   ├── sli-train      <- Training data for SLI (one folder per language)
│   │   ├── eng/       <- .wav files (English utterances)
│   │   ├── zmu/       <- .wav files (Muruwari utterances)
│   ├── asr-train      <- Intermediate data that has been transformed.
│   │   ├── eng.tsv    <- transcriptions
│   │   ├── eng/       <- .wav files (English utterances)

Usage

VAD

# VAD
python scripts/run_vad-by-silero.py myrecording.wav

SLI

# To train a classifier using your own clips and then save it:
python scripts/train_sli-by-sblr.py data/sli-train models/zmu-eng_sli_k10.pkl

# Use trained model to classify VAD-detected regions as eng or zmu
python scripts/run_sli-by-sblr.py models/zmu-eng_sli_k10.pkl myrecording.wav

ASR

# To fine-tune a wav2vec 2.0 model and save the checkpoint:
python scripts/train_asr-by-w2v2.py data/asr-train data/checkpoints/no-lm_b10

# Transcribe using trained model 
python scripts/run_asr-by-w2v2.py data/checkpoints/no-lm_b10 myrecording.wav

Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

3 Jan 5, 2022

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Releases(1.1.0)

1.1.0(Apr 23, 2022)
Switched to using pre-existing vocabulary from pre-trained model (see Appendix A in paper).

Source code(tar.gz)
Source code(zip)
1.0.0(Apr 18, 2022)

Source code(tar.gz)
Source code(zip)
0.9.0(Apr 14, 2022)

Pre-release to check Zenodo sync
Source code(tar.gz)
Source code(zip)

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Related tags

Overview

VAD-SLI-ASR

Set up

Data

Usage

VAD

SLI

ASR

You might also like...

Paddlespeech Streaming ASR GUI

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project converts your human voice input to its text transcript and to an automated voice too.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Every Google, Azure & IBM text to speech voice for free

Releases(1.1.0)

1.1.0(Apr 23, 2022)

1.0.0(Apr 18, 2022)

0.9.0(Apr 14, 2022)

Owner

Dynamics of Language

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Fast, DB Backed pretrained word embeddings for natural language processing.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Image2pcl - Enter the metaverse with 2D image to 3D projections

wxPython app for converting encodings, modifying and fixing SRT files

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Lingtrain Aligner — ML powered library for the accurate texts alignment.

A natural language processing model for sequential sentence classification in medical abstracts.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

PIZZA - a task-oriented semantic parsing dataset

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

CDLA: A Chinese document layout analysis (CDLA) dataset

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

An evaluation toolkit for voice conversion models.

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

Convolutional Neural Networks for Sentence Classification

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning