Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Last update: Dec 31, 2022

Overview

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0:

a much smaller and cleaner codebase
Python-first API (the good old pyannote-audio CLI will still be available, though)
multi-GPU and TPU training thanks to pytorch-lightning
data augmentation with torch-audiomentations
huggingface model hosting
prodigy recipes for audio annotations
online demo based on streamlit

Neural speaker diarization with `pyannote-audio`

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines:

pyannote.audio also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding:

Installation

pyannote.audio only supports Python 3.7 (or later) on Linux and macOS. It might work on Windows but there is no garantee that it does, nor any plan to add official support for Windows.

The instructions below assume that pytorch has been installed using the instructions from https://pytorch.org.

$ pip install pyannote.audio==1.1.1

Documentation and tutorials

Use pretrained models and pipelines
- Apply pretrained pipelines on your own data
- Apply pretrained models on your own data
Prepare your own data
- Annotate your own data semi-automatically with Prodigy
- Prepare your own dataset for training
Train models on your own data
Tune pipelines on your own data
- Speech activity detection pipeline
- Speaker diarization pipeline

Until a proper documentation is released, note that part of the API is described in this tutorial.

Citation

If you use pyannote.audio please use the following citation

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Comments

[WIP] Multilabel Detection
This is a new PR for the VTC feature, this time based on a cleaner implem. I'm making a new PR as to keep the former branch "clean" (and prevent any mishaps).

What is done:

renaming the SpeakerTracking task into a MultilabelDetection task

added MultilabelPipeline

update MultilabelFscore.report()

tested the new preprocessor

What's to be done:

[ ] re-test the new implem on our clinical data (as well as the child data from @MarvinLvn )

[ ] maybe a couple of unit tests (especially for the preprocessor)

[ ] maybe make the aggregated "multilabel" fscore duration-based instead of file-based
opened by hadware 29
Trying the diarization pipeline on random .wav files
Hey, as suggested by the detailed tutorials, i went through them and trained all the models required for the pipeline. The pipeline is working on the AMI dataset but when i try to reproduce the results on other .wav files sampled at 16k, mono, and 256bps, it is not able to diarize the audio. Here is the breif of what i actually did.

Took a random meeting audio file, sampled at 16k , mono and 256bps

renamed it to ES2003a and replaced it with actual ES2003a ( thought it as a turnaround of creating another database )

ran all the pipelines ( sad,scd, emb, diarization )

Output :

Speaker activity detection works perfectly and is able to classify regions of speech.

Speaker diarization does't works, everything is classified as 0

can you please tell if its because of replacing the actual file that the pipeline is giving wrong outputs for the diarization, and whats a better way to test the pipeline on random audios.
opened by saisumit 26
build error

Hi, when I run pip install "pyannote.audio==0.3", I got the following error msg:

In file included from _pysndfile.cpp:471:0: pysndfile.hh:55:21: fatal error: sndfile.h: No such file or directory #include <sndfile.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1

Failed building wheel for pysndfile Running setup.py clean for pysndfile Failed to build pysndfile
cannot_reproduce

opened by ChristopherLu 24

Add support for file handle to pyannote.audio.core.io.Audio

This is not currently supported:

from pyannote.audio.core.io import Audio
from pyannote.core import Segment
audio = Audio()
with open('file.wav', 'rb') as f:
    waveform, sample_rate = audio(f)
with open('file.wav', 'rb') as f:
    waveform, sample_rate = audio.crop(f, Segment(10, 20))

One has to do this instead:

from pyannote.audio.core.io import Audio
from pyannote.core import Segment
audio = Audio()
waveform, sample_rate = audio('file.wav')
waveform, sample_rate = audio.crop('file.wav', Segment(10, 20))

This is a limitation that might be problematic (e.g. with streamlit.file_uploader that returns a file handle)

opened by hbredin 20

ValueError: inconsistent "classes" (is ['non_change', 'change'], should be: ['non_speech', 'speech'])
Describe the bug I'm trying to go through the diarization pipeline tutorial on my own data.

I am trying to run "apply" on my own data and model for speaker change detection. I get an error that looks like it's trying to apply speech activity detection

ValueError: inconsistent "classes" (is ['non_change', 'change'], should be: ['non_speech', 'speech'])

To Reproduce Steps to reproduce the behavior:

$ export EXP_DIR=tutorials/pipelines/speaker_diarization $ pyannote-audio scd apply --step=0.1 --pretrained="<path to>/tutorials/models/speaker_change_detection/train/myData.SpeakerDiarization.general.train/validate_segmentation_fscore/myData.SpeakerDiarization.general.train" --subset=train ${EXP_DIR} myData.SpeakerDiarization.general

pyannote environment

$ pip freeze | grep pyannote pyannote.core==4.1 pyannote.database==4.0.1 pyannote.metrics==3.0.1 pyannote.pipeline==1.5.2

Additional context I only prepared a development set called "train" right now - so I'm running on that. I successfully ran the SAD apply step before moving to SCD.
wontfix
opened by danFromTelAviv 20

An error was encountered while loading "pyannote/speaker-diarization"

Hello，when i run the code :

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="my_token")

I get an error :

Traceback (most recent call last):
  File "/home/dg/anaconda3/envs/pyannote/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 213, in hf_raise_for_status
    response.raise_for_status()
  File "/home/dg/anaconda3/envs/pyannote/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/pyannote/segmentation/resolve/2022.07/pytorch_model.bin

whether I use the read token role or the write token role. Anyone else know how to fix it? Thx.

opened by Zpadger 18

[WIP] Feat/vtc

This is a working PR on the future VTC implementation inspired from @MarvinLvn 's work, and to be merged into the next release of pyannote-audio.

Note: nothing has been done yet, this is just to get things started.
wontfix

opened by hadware 17

Trying to finetune model for new speaker

I am trying to finetune models to support one more speaker, but it looks I am doing something wrong.

I want to use "dia_hard" pipeline, so I need to finetune models: {sad_dihard, scd_dihard, emb_voxceleb}.

For my speaker I have one WAV file with duration more then 1 hour.

So, I created database.yml file:

Databases:
   IK: /content/fine/kirilov/{uri}.wav

Protocols:
    IK:
       SpeakerDiarization:
          kirilov:
            train:
               uri: train.lst
               annotation: train.rttm
               annotated: train.uem

and put additional files near database.yml:

kirilov
├── database.yml
├── kirilov.wav
├── train.lst
├── train.rttm
└── train.uem

train.lst: kirilov

train.rttm: SPEAKER kirilov 1 0.0 3600.0 <NA> <NA> Kirilov <NA> <NA>

train.uem: kirilov NA 0.0 3600.0

I assume it will say trainer to use kirilov.wav file and take 3600 seconds of audio from it to use for training.

Now I finetune the models, current folder is /content/fine/kirilov, so database.yml is taken from the current directory:

!pyannote-audio sad train --pretrained=sad_dihard --subset=train --to=1 --parallel=4 "/content/fine/sad" IK.SpeakerDiarization.kirilov
!pyannote-audio scd train --pretrained=scd_dihard --subset=train --to=1 --parallel=4 "/content/fine/scd" IK.SpeakerDiarization.kirilov
!pyannote-audio emb train --pretrained=emb_voxceleb --subset=train --to=1 --parallel=4 "/content/fine/emb" IK.SpeakerDiarization.kirilov

Output looks like:

Using cache found in /root/.cache/torch/hub/pyannote_pyannote-audio_develop
Loading labels: 0file [00:00, ?file/s]/usr/local/lib/python3.6/dist-packages/pyannote/database/protocol/protocol.py:128: UserWarning:

Existing key "annotation" may have been modified.

Loading labels: 1file [00:00, 20.49file/s]
/usr/local/lib/python3.6/dist-packages/pyannote/audio/train/trainer.py:128: UserWarning:

Did not load optimizer state (most likely because current training session uses a different loss than the one used for pre-training).

2020-06-19 15:35:26.763592: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Training:   0%|                                        | 0/1 [00:00<?, ?epoch/s]
Epoch pyannote/pyannote-database#1:   0%|                                       | 0/29 [00:00<?, ?batch/s]
Epoch pyannote/pyannote-database#1:   0%|                           | 0/29 [00:00<?, ?batch/s, loss=0.676]
Epoch pyannote/pyannote-database#1:   3%|▋                  | 1/29 [00:00<00:26,  1.04batch/s, loss=0.676]

Etc.

And try to run pipeline with new .pt's:

import os
import torch
from pyannote.audio.pipeline import SpeakerDiarization
pipeline = SpeakerDiarization(embedding = "/content/fine/emb/train/IK.SpeakerDiarization.kirilov.train/weights/0001.pt", 
                              sad_scores = "/content/fine/sad/train/IK.SpeakerDiarization.kirilov.train/weights/0001.pt",
                              scd_scores = "/content/fine/scd/train/IK.SpeakerDiarization.kirilov.train/weights/0001.pt",
                              method= "affinity_propagation")

#params from dia_dihard\train\X.SpeakerDiarization.DIHARD_Official.development\params.yml
pipeline.load_params("/content/drive/My Drive/pyannote/params.yml")
FILE = {'audio': "/content/groundtruth/new.wav"}
diarization = pipeline(FILE)
diarization

The result is that for my new.wav the whole audio is recognized as speaker talking without pauses. So I assume that the models were broken. And it does not matter if I train for 1 epoch or for 100.

In case I use:

0000.pt - I assume these are the original models

pipeline = SpeakerDiarization(embedding = "/content/fine/emb/train/IK.SpeakerDiarization.kirilov.train/weights/0000.pt", 
                              sad_scores = "/content/fine/sad/train/IK.SpeakerDiarization.kirilov.train/weights/0000.pt",
                              scd_scores = "/content/fine/scd/train/IK.SpeakerDiarization.kirilov.train/weights/0000.pt",
                              method= "affinity_propagation")

weights from original models

pipeline = SpeakerDiarization(embedding = "/content/drive/My Drive/pyannote/emb_voxceleb/train/X.SpeakerDiarization.VoxCeleb.train/weights/0326.pt", 
                             sad_scores = "/content/drive/My Drive/pyannote/sad_dihard/sad_dihard/train/X.SpeakerDiarization.DIHARD_Official.train/weights/0231.pt",
                             scd_scores = "/content/drive/My Drive/pyannote/scd_dihard/train/X.SpeakerDiarization.DIHARD_Official.train/weights/0421.pt",
                             method= "affinity_propagation")

everything is ok and the result is similar to

pipeline = torch.hub.load('pyannote/pyannote-audio', 'dia_dihard')
FILE = {'audio': "/content/groundtruth/new.wav"}
diarization = pipeline(FILE)
diarization

Could you please advise what could be wrong with my training\finetuning process?

opened by marlon-br 17

`b c t` vs. `b t c`?

Issue by hbredin Friday Oct 30, 2020 at 16:46 GMT Originally opened as https://github.com/hbredin/pyannote-audio-v2/issues/54

Which convention should we use?
v2

opened by mogwai 16
Segmentation Fault when conducting change detection tutorial

Hi,

Everything seems ok for the feature extraction tutorial. But when I train the model following exactly what the tutorial asks me to do for change detection, I got segmentation fault. What might be probably the reason. Thank you for your help.

opened by Charliechen1 16
Cannot find my Pretrained model

I successfully trained an sad model. I want to create sad scores as part of the speaker diarization pipeline. I thought I am passing the weights correctly to the pyannote-audio script but the model is never found and the script aborts. Here is the output of my bash script with tracing on.

This is the error message I get.

RuntimeError: Cannot find callable /misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt in hubconf

For readability, I have boldfaced the commands in the script.

++ export EXP_DIR=models/speaker_diarization ++ EXP_DIR=models/speaker_diarization ++ cd /misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2 ++ export PYANNOTE_DATABASE_CONFIG=/misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/database.yml ++ PYANNOTE_DATABASE_CONFIG=/misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/database.yml ++ sad_model=/misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt ++ ls /misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt /misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt ++ pyannote-audio sad apply --step=0.1 --pretrained=/misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt --subset=dev models/speaker_diarization AMI.SpeakerDiarization.MixHeadset Using cache found in /home/map22/.cache/torch/hub/pyannote_pyannote-audio_develop Traceback (most recent call last): File "/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/pyannote-feb32020/bin/pyannote-audio", line 8, in sys.exit(main()) File "/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/pyannote-feb32020/lib/python3.7/site-packages/pyannote/audio/applications/pyannote_audio.py", line 406, in main apply_pretrained(validate_dir, protocol, **params) File "/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/pyannote-feb32020/lib/python3.7/site-packages/pyannote/audio/applications/base.py", line 514, in apply_pretrained step=step) File "/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/pyannote-feb32020/lib/python3.7/site-packages/torch/hub.py", line 364, in load entry = _load_entry_from_hubconf(hub_module, model) File "/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/pyannote-feb32020/lib/python3.7/site-packages/torch/hub.py", line 237, in _load_entry_from_hubconf raise RuntimeError('Cannot find callable {} in hubconf'.format(model)) RuntimeError: Cannot find callable /misc/vlgscratch4/PichenyGroup/picheny/headcam/headcam-code-try2/models/speech_activity_detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0101.pt in hubconf

opened by picheny-nyu 15
Clustering pipeline not working
When I try

from pyannote.audio.pipelines.clustering import AgglomerativeClustering spkr_cluster = AgglomerativeClustering().cluster(spkr_embeds, min_clusters=1, max_clusters=np.inf, num_clusters=2)

The function cluster() crashes on everything which includes a class of Parameter, e.g.

for self.min_cluster_size it is Error:'<' not supported between instances of 'int' and 'Integer'

for self.method it is Error:Invalid method: <pyannote.pipeline.parameter.Categorical object at 0x000001B68E34A250>

for self.threshold it is Error:float() argument must be a string or a number, not 'Uniform'

What's happening here? How to fix?
opened by saveli 0

references #1, added disk store abstraction and used it in segmentati…

…on and diarization

When I run the segmentation or speaker diarization pipelines on very long audio files (>= 2 hours) the existing code is taking up too much RAM and my process is killed by the OOM killer.

In this PR I introduce a DiskStore abstraction which I employed in many places in the segmentation and diarization pipelines to back some numpy arrays and/or Pytorch tensors with disk storage (using memory mapping).

After the modifications, I am able to successfully run the pipeline on my long audio files (with significantly reduced memory consumption and minimal impact on the run time).

Using Disk Store:

from pyannote.audio.utils.diskstore import DiskStore

dstore_path = "/home/prashanth/tmp/dstore"
if not os.path.exists(dstore_path):
    os.makedirs(dstore_path)
dstore = DiskStore(dstore_path)

...

diarization = pipeline(audio, hook=hook, disk_store=dstore)
dstore.cleanup() # deletes on disk files

I noticed that loading the Audio input itself was a substantial cause of memory consumption so I wrote this piece of code to alleviate the memory load. It wasn't obvious where and how to integrate the functionality into pyannote so I am pasting the code below. In addition to conserving memory, it also decodes the input audio using multiple cores to speed up the operation for large files. Note that if the sample rate of the the input audio file is not 16Khz as expected by the pipelines, I use ffmpeg to convert first (in order to prevent the existing code from downsampling and creating an in-memory copy of the audio data)


from deeputil import generate_random_string
from soundfile import SoundFile
import ffmpeg
import numpy as np
import torch

import multiprocessing
import os
import concurrent
from concurrent.futures.process import ProcessPoolExecutor

NCORES = 4

fpath = "..." # path to very long audio file

def get_ranges(n, n_parts=0, n_per_part=0, offset=0):
    assert n_parts or n_per_part
    assert not (n_parts and n_per_part)

    if n_parts:
        n_per_part = int(n / n_parts)
    else:
        n_parts = int(n / n_per_part)

    n_excess = n % n_per_part
    parts = zip(range(0, n - 1, n_per_part), [n_per_part] * n_parts)
    parts = [[ps + offset, pd] for ps, pd in parts]
    parts[-1][-1] += n_excess

    return parts


def decode_audio(inp_fpath, out, start, nframes, nframes_per_part=48000 * 5):
    out = get_waveform("r+")
    with SoundFile(inp_fpath) as inp:
        ranges = get_ranges(nframes, n_per_part=nframes_per_part, offset=start)
        for part_start, part_frames in ranges:
            inp.seek(part_start)
            inp.read(part_frames, out=out[0][part_start : part_start + part_frames])


samplerate = 0
nframes = 0
with SoundFile(fpath) as inp:
    nframes = inp.frames
    samplerate = inp.samplerate

if samplerate != model.hparams.sample_rate:
    tmp_fpath = os.path.join(
        "/home/prashanth/tmp/dstore/audio_%s.flac" % generate_random_string(length=10)
    )
    istream = ffmpeg.input(fpath)
    ostream = istream.output(
        tmp_fpath, acodec="flac", ar=model.hparams.sample_rate, format="flac"
    )
    ffmpeg.run(ostream, quiet=True)
    fpath = tmp_fpath

with SoundFile(fpath) as inp:
    nframes = inp.frames
    samplerate = inp.samplerate

parts = get_ranges(nframes, n_parts=NCORES)

waveform_fpath = os.path.join(
    "/home/prashanth/tmp/dstore/audio_%s" % generate_random_string(length=10)
)

def get_waveform(mode):
    return np.memmap(waveform_fpath, mode=mode, shape=(1, nframes), dtype=np.float32)


waveform = get_waveform("w+")

futures = []
with ProcessPoolExecutor(max_workers=NCORES) as executor:
    for pstart, pframes in parts:
        fut = executor.submit(decode_audio, fpath, None, pstart, pframes)
        futures.append(fut)

    futures, _ = concurrent.futures.wait(futures)
    results = [fut.result() for fut in futures]

waveform = torch.from_numpy(waveform)
audio = dict(waveform=waveform, sample_rate=samplerate)

I noticed these issues on this repo which might be related to this

https://github.com/pyannote/pyannote-audio/issues/1201
https://github.com/pyannote/pyannote-audio/issues/1165
https://github.com/pyannote/pyannote-audio/issues/1156
https://github.com/pyannote/pyannote-audio/issues/1090

I am still working on figuring out a way to get past memory consumption issues with Agglomerative Clustering on a very large number of embeddings. If I am able to make progress there, I will issue a separate PR.

opened by prashanthellina 1

TypeError: __init__() missing 1 required positional argument: 'signature' while reproducing change-detection tutorial

While following this tutorial https://github.com/pyannote/pyannote-audio/tree/89da05ea9d6de97da9bd21949a26ceb0042ef361/tutorials/change-detection

and while executing this " pyannote-change-detection train \ ${EXPERIMENT_DIR} \ AMI.SpeakerDiarization.MixHeadset "

File "/home/jashwanth/miniconda3/envs/py36-pyannote-audio/bin/pyannote-change-detection", line 8, in sys.exit(main()) File "/home/jashwanth/miniconda3/envs/py36-pyannote-audio/lib/python3.6/site-packages/pyannote/audio/applications/change_detection.py", line 380, in main train(protocol, experiment_dir, train_dir, subset=subset) File "/home/jashwanth/miniconda3/envs/py36-pyannote-audio/lib/python3.6/site-packages/pyannote/audio/applications/change_detection.py", line 202, in train generator = ChangeDetectionBatchGenerator(feature_extraction) File "/home/jashwanth/miniconda3/envs/py36-pyannote-audio/lib/python3.6/site-packages/pyannote/audio/generators/change.py", line 93, in init segment_generator) TypeError: init() missing 1 required positional argument: 'signature' please can anyone help me with this!!

Thank you in advance

opened by Jashwantherao 0
Fixes for PytorchLightning >= 1.8

Adjust to PytorchLightning API changes in version 1.8.0. I did some testing to make sure nothing broke, including model training/finetuning and loading from pretrained/HF-hub; however, my tests likely didn't cover everything.

opened by entn-at 1
Will it work on real time streaming data ?

I am currently running it on Apple M2 chip it is taking way much time comparing to Colab. Is there a way that pipeline could be modified to streaming data, and combined with some transcription service ?

opened by ankurdhuriya 0
about pipeline when use gpu，and some questions?

I really appreciate your help！ Here are some of my questions： 1 When I use token, Is my data uploaded to the cloud server? The data is analyzed on the cloud server and sent back to me? 2 This is too slow, can I use a local GPU? How do you use it?

this is my code:

Thanks for your time!

opened by BeyondLightYear 0

Releases(1.1.1)

1.1.1(Nov 25, 2020)

Source code(tar.gz)
Source code(zip)

Owner

pyannote

GitHub Repository http://pyannote.github.io

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Related tags

Overview

Neural speaker diarization with pyannote-audio

Installation

Documentation and tutorials

Citation

Comments

Releases(1.1.1)

1.1.1(Nov 25, 2020)

Owner

pyannote

Scrap electronic music charts into CSV files

ianZiPu is a way to write notation for Guqin (古琴) music.

A tool for retrieving audio in the past

An audio-solving python funcaptcha solving module

Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Sequencer: Deep LSTM for Image Classification

Music bot of # Owner

Delta TTA(Text To Audio) SoftWare

A python wrapper for REAPER

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

This bot can stream audio or video files and urls in telegram voice chats

Okaeri-Music is a telegram music bot project, allow you to play music on voice chat group telegram.

Analyze, visualize and process sound field data recorded by spherical microphone arrays.

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

A Simple Script that will help you to Play / Change Songs with just your Voice

User-friendly Voice Cloning Application

The project aims to develop a personal-assistant for Windows & Linux-based systems

Audio spatialization over WebRTC and JACK Audio Connection Kit

Spotipy - Player de música simples em Python

Neural speaker diarization with `pyannote-audio`