Python interface to the WebRTC Voice Activity Detector

Last update: Dec 22, 2022

Related tags

Audio py-webrtcvad

Overview

py-webrtcvad

This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3.

A VAD classifies a piece of audio data as being voiced or unvoiced. It can be useful for telephony and speech recognition.

The VAD that Google developed for the WebRTC project is reportedly one of the best available, being fast, modern and free.

How to use it

Install the webrtcvad module:
```
pip install webrtcvad
```
Create a Vad object:
```
import webrtcvad
vad = webrtcvad.Vad()
```
Optionally, set its aggressiveness mode, which is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive. (You can also set the mode when you create the VAD, e.g. vad = webrtcvad.Vad(3)):
```
vad.set_mode(1)
```

Give it a short segment ("frame") of audio. The WebRTC VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. A frame must be either 10, 20, or 30 ms in duration:

# Run the VAD on 10 ms of silence. The result should be False.
sample_rate = 16000
frame_duration = 10  # ms
frame = b'\x00\x00' * int(sample_rate * frame_duration / 1000)
print 'Contains speech: %s' % (vad.is_speech(frame, sample_rate)

See example.py for a more detailed example that will process a .wav file, find the voiced segments, and write each one as a separate .wav.

How to run unit tests

To run unit tests:

pip install -e ".[dev]"
python setup.py test

History

2.0.10

Fixed memory leak. Thank you, bond005!

2.0.9

Improved example code. Added WebRTC license.

2.0.8

Fixed Windows compilation errors. Thank you, xiongyihui!

Python interface to the WebRTC Voice Activity Detector

Related tags

Overview

py-webrtcvad

How to use it

How to run unit tests

History

Owner

John Wiseman

DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats

Pyrogram bot to automate streaming music in voice chats

Voice to Text using Raspberry Pi

Real-time audio visualizations (spectrum, spectrogram, etc.)

Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

A useful tool to generate chord progressions according to melody MIDIs

Accompanying code for our paper "Point Cloud Audio Processing"

Analyze, visualize and process sound field data recorded by spherical microphone arrays.

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Scalable audio processing framework written in Python with a RESTful API

Dataset and baseline code for the VocalSound dataset (ICASSP2022).

A Python wrapper around the Soundcloud API

Delta TTA(Text To Audio) SoftWare

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

Python tools for the corpus analysis of popular music.

Analysis of voices based on the Mel-frequency band

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/