Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Overview

simple_diarizer

Open In Colab

Simplified diarization pipeline using some pretrained models.

Made to be a simple as possible to go from an input audio file to diarized segments.

import soundfile as sf
import matplotlib.pyplot as plt

from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot

diar = Diarizer(
                  embed_model='xvec', # 'xvec' and 'ecapa' supported
                  cluster_method='sc' # 'ahc' and 'sc' supported
               )

segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)

signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()

Source Video

"Some Quick Advice from Barack Obama!"

YouTube Thumbnail

Pre-trained Models

The following pretrained models are used:

Demo

Open In Colab

It can be checked out in the above link, where it will try and diarize any input YouTube URL. It will also use YouTube's autogenerated transcriptions to produce a speaker labelled transcription.

Hopefully this can be of use as a free basic tool to produce a diarized transcript of a video/audio of interest.

Other References

Planned Features

Comments
  • WIP - Make an installable package

    WIP - Make an installable package

    Description:

    • Include requirements.txt.
    • Add setup*. files to build a package.
    • Create a folder simple_diarizer to store source code.
    • Create Github Workflow to publish the package.

    How to test:

    • Run command pip install .
    • Outside project folder type python and from simple_diarizer import diarizer

    Notes:

    • Cannot use python 3.10.x yet

    Source code to test:

    from simple_diarizer.utils import (convert_wavfile, download_youtube_wav)
    
    from simple_diarizer.diarizer import Diarizer
    import tempfile
    
    YOUTUBE_ID = "HyKmkLEtQbs"
    
    with tempfile.TemporaryDirectory() as outdir:
        yt_file = download_youtube_wav(YOUTUBE_ID, outdir)
    
        wav_file = convert_wavfile(yt_file, f"{outdir}/{YOUTUBE_ID}_converted.wav")
    
        print(f"wav file: {wav_file}")
    
        diar = Diarizer(
            embed_model='ecapa', # supported types: ['xvec', 'ecapa']
            cluster_method='sc', # supported types: ['ahc', 'sc']
            window=1.5, # size of window to extract embeddings (in seconds)
            period=0.75 # hop of window (in seconds)
        )
    
        NUM_SPEAKERS = 2
    
        segments = diar.diarize(wav_file, 
                                num_speakers=NUM_SPEAKERS,
                                outfile=f"{outdir}/{YOUTUBE_ID}.rttm")
    
        print(segments)     
    
    opened by johnidm 16
  • "[Errno 30] Read-only file system: 'pretrained_models'"

    I am using macOS and I am getting error "[Errno 30] Read-only file system: 'pretrained_models'" From what I can tell, the pretrained models are being fetched if you do not have them.

    However, the save location is the root directory which is read-only. This is where I believe is the target directory "./pretrained_model_checkpoints"

    Is there another location that can be used that can be used?

    PythonKit/Python.swift:706: Fatal error: 'try!' expression unexpectedly raised an error: Python exception: [Errno 30] Read-only file system: 'pretrained_models' Traceback: File "/Users/wedwards/Documents/Development/A_PythonKit_Test/A_PythonKit_Test/Simple Diarizer.py", line 42, in diar = Diarizer( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/simple_diarizer/diarizer.py", line 48, in init self.embed_model = EncoderClassifier.from_hparams( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 342, in from_hparams hparams_local_path = fetch( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/fetching.py", line 86, in fetch savedir.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode)

    2022-11-11 13:14:00.531470-0500 A_PythonKit_Test[69382:7584330] PythonKit/Python.swift:706: Fatal error: 'try!' expression unexpectedly raised an error: Python exception: [Errno 30] Read-only file system: 'pretrained_models' Traceback: File "/Users/wedwards/Documents/Development/A_PythonKit_Test/A_PythonKit_Test/Simple Diarizer.py", line 42, in diar = Diarizer( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/simple_diarizer/diarizer.py", line 48, in init self.embed_model = EncoderClassifier.from_hparams( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 342, in from_hparams hparams_local_path = fetch( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/fetching.py", line 86, in fetch savedir.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode)

    opened by MrEdwards007 5
  • Latest Python and packages

    Latest Python and packages

    The current release prevents use of Python 3.10 and requires specific versions of Beautiful Soup and PyTube.

    I've forked the repo to overcome these version limitations and it's working for me. I haven't made a pull request, however, as your repo doesn't have tests and I don't know whether there is a use case which would be broken by my changes.

    Can you please remove these version limitations if they're not needed?

    Thanks for the repo - it's effective and much easier to use than SpeechBrain.

    opened by andrewmackie 3
  • takes 1 positional argument but 2 were given

    takes 1 positional argument but 2 were given

    running a demo on google co-lab i am getting the following error, any idea how to resolve this,

    File "/root/anaconda3/envs/simple/lib/python3.8/site-packages/speechbrain/pretrained/fetching.py", line 116, in fetch fetched_file = huggingface_hub.cached_download(url, use_auth_token) TypeError: cached_download() takes 1 positional argument but 2 were given

    opened by SanaullahOfficial 2
  • AttributeError when running Diarizer in simple_diarizer.diarizer

    AttributeError when running Diarizer in simple_diarizer.diarizer

    Hi there!

    When running the following code in Python 3.7 on a fresh conda environment in Ubuntu 22.04

    from simple_diarizer.diarizer import Diarizer
    
    diar = Diarizer(
                        embed_model='xvec', # 'xvec' and 'ecapa' suported
                        cluster_method='sc' # 'ahc' and 'sc' supported
                    )
    

    I get the following error:

    <ipython-input-3-286690ce0195> in <module>
          1 diar = Diarizer(
          2                     embed_model='xvec', # 'xvec' and 'ecapa' suported
    ----> 3                     cluster_method='sc' # 'ahc' and 'sc' supported
          4                 )
    
    ~/anaconda3/envs/test/lib/python3.7/site-packages/simple_diarizer/diarizer.py in __init__(self, embed_model, cluster_method, window, period)
         44             self.embed_model = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb",
         45                                                               savedir="pretrained_models/spkrec-xvect-voxceleb",
    ---> 46                                                               run_opts=self.run_opts)
         47         if embed_model == 'ecapa':
         48             self.embed_model = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb",
    
    ~/anaconda3/envs/test/lib/python3.7/site-packages/speechbrain/pretrained/interfaces.py in from_hparams(cls, source, hparams_file, pymodule_file, overrides, savedir, use_auth_token, **kwargs)
        349         # Load the modules:
        350         with open(hparams_local_path) as fin:
    --> 351             hparams = load_hyperpyyaml(fin, overrides)
        352 
        353         # Pretraining:
    
    ~/anaconda3/envs/test/lib/python3.7/site-packages/hyperpyyaml/core.py in load_hyperpyyaml(yaml_stream, overrides, overrides_must_match)
        187 
        188     # Remove items that start with "__"
    --> 189     removal_keys = [k for k in hparams.keys() if k.startswith("__")]
        190     for key in removal_keys:
        191         del hparams[key]
    
    AttributeError: 'str' object has no attribute 'keys'
    opened by masonhargrave 2
  • Make project installable

    Make project installable

    Hi @cvqluu, this project is amazing, thanks for sharing.

    I have some experience in packaging projects in Python.

    What do you think I make these items on your to-do list?

    • Add to PyPi (make pip installable)
    • requirements.txt

    If you authorize me, I will start doing this now and submit pull requests for your review and approval.

    opened by johnidm 1
  • Added ipython depedency

    Added ipython depedency

    Tested on local machine using:

    pip install --user git+https://github.com/cvqluu/[email protected]
    

    Fix for https://github.com/cvqluu/simple_diarizer/issues/12

    opened by cvqluu 0
  • Bump ipython from 7.30.1 to 7.31.1

    Bump ipython from 7.30.1 to 7.31.1

    Bumps ipython from 7.30.1 to 7.31.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Undeclared IPython dependency

    Undeclared IPython dependency

    The current package (0.0.12 on PyPI) cannot run without IPython, but this is missing from requirements.txt

    Steps to reproduce (outside of a Jupyter notebook):

    pip install simple-diarizer
    
    # index.py
    from simple_diarizer.diarizer import Diarizer
    

    Output:

    File "[redacted]\index.py", line 1, in <module>
        from simple_diarizer.diarizer import Diarizer
    File "[redacted]\lib\site-packages\simple_diarizer\diarizer.py", line 13, in <module>
        from .utils import check_wav_16khz_mono, convert_wavfile
    File "[redacted]\lib\site-packages\simple_diarizer\utils.py", line 8, in <module>
        from IPython.display import Audio, display
    ModuleNotFoundError: No module named 'IPython'
    
    opened by DavidRalph 1
  • waveplot_perspeaker causes argument out of range error

    waveplot_perspeaker causes argument out of range error

    While running through your code example, testing the workflow on a different audio file produced the following output:

    C:\Users\xxx\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:187: RuntimeWarning: invalid value encountered in divide
      scaled = data / normalization_factor * 32767
    ---------------------------------------------------------------------------
    error                                     Traceback (most recent call last)
    Cell In [18], line 1
    ----> 1 waveplot_perspeaker(signal, fs, segments)
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\simple_diarizer\utils.py:166, in waveplot_perspeaker(signal, fs, segments)
        164 if "words" in seg:
        165     pprint(seg["words"])
    --> 166 display(Audio(speech, rate=fs))
        167 print("=" * 40 + "\n")
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:130, in Audio.__init__(self, data, filename, url, embed, rate, autoplay, normalize, element_id)
        128 if rate is None:
        129     raise ValueError("rate must be specified when data is a numpy array or list of audio samples.")
    --> 130 self.data = Audio._make_wav(data, rate, normalize)
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:162, in Audio._make_wav(data, rate, normalize)
        160 waveobj.setsampwidth(2)
        161 waveobj.setcomptype('NONE','NONE')
    --> 162 waveobj.writeframes(scaled)
        163 val = fp.getvalue()
        164 waveobj.close()
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:437, in Wave_write.writeframes(self, data)
        436 def writeframes(self, data):
    --> 437     self.writeframesraw(data)
        438     if self._datalength != self._datawritten:
        439         self._patchheader()
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:426, in Wave_write.writeframesraw(self, data)
        424 if not isinstance(data, (bytes, bytearray)):
        425     data = memoryview(data).cast('B')
    --> 426 self._ensure_header_written(len(data))
        427 nframes = len(data) // (self._sampwidth * self._nchannels)
        428 if self._convert:
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:467, in Wave_write._ensure_header_written(self, datasize)
        465 if not self._framerate:
        466     raise Error('sampling rate not specified')
    --> 467 self._write_header(datasize)
    
    File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:479, in Wave_write._write_header(self, initlength)
        477 except (AttributeError, OSError):
        478     self._form_length_pos = None
    --> 479 self._file.write(struct.pack('<L4s4sLHHLLHH4s',
        480     36 + self._datalength, b'WAVE', b'fmt ', 16,
        481     WAVE_FORMAT_PCM, self._nchannels, self._framerate,
        482     self._nchannels * self._framerate * self._sampwidth,
        483     self._nchannels * self._sampwidth,
        484     self._sampwidth * 8, b'data'))
        485 if self._form_length_pos is not None:
        486     self._data_length_pos = self._file.tell()
    
    error: argument out of range
    

    Any ideas what the issue could be? It works fine on other audio files, and everything up to this point seems to run without error.

    opened by dcruiz01 1
Releases(v0.0.13)
Owner
Chau
PhD student at the University of Edinburgh, CSTR
Chau
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022
Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

tqfang 9 Dec 17, 2022
Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

Dirk Neuhäuser 4 Apr 06, 2022
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 07, 2023
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group 39 Nov 15, 2022
Open solution to the Toxic Comment Classification Challenge

Starter code: Kaggle Toxic Comment Classification Challenge More competitions 🎇 Check collection of public projects 🎁 , where you can find multiple

minerva.ml 153 Jun 22, 2022
Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Toward a Visual Concept Vocabulary for GAN Latent Space Code and data from the ICCV 2021 paper Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Kl

Sarah Schwettmann 13 Dec 23, 2022
A natural language processing model for sequential sentence classification in medical abstracts.

NLP PubMed Medical Research Paper Abstract (Randomized Controlled Trial) A natural language processing model for sequential sentence classification in

Hemanth Chandran 1 Jan 17, 2022
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Structured Super Lottery Tickets in BERT This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compress

Chen Liang 16 Dec 11, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Artefact 114 Dec 15, 2022
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 08, 2021
Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

Junbum Lee 12 Oct 26, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classifi

186 Dec 24, 2022
Weaviate demo with the text2vec-openai module

Weaviate demo with the text2vec-openai module This repository contains an example of how to use the Weaviate text2vec-openai module. When using this d

SeMI Technologies 11 Nov 11, 2022
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

7 Nov 02, 2022
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022
SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。 声明 本项目是为方便个人工作所创建的,仅有部分代码原创。

Ming 30 Dec 02, 2022