This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

Last update: Dec 26, 2022

Related tags

Deep Learning EEND-vector-clustering

Overview

EEND-vector clustering

The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates two complementary major diarization approaches, i.e., traditional clustering-based and emerging end-to-end neural network-based approaches, to make the best of both worlds. In [1] it is shown that the EEND-vector clustering outperforms EEND when the recording is long (e.g., more than 5 min), while in [2] it is shown based on CALLHOME data that it outperforms x-vector clustering and EEND-EDA especially when the number of speakers in recordings is large.

This repository contains an example implementation of the EEND-vector clustering based on Pytorch to reproduce the results in [2], i.e., the CALLHOME experiments. For the trainer, we use Padertorch. This repository is implemented based on EEND and relies on some useful functions provided therein.

References

[1] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds," Proc. ICASSP, pp. 7198–7202, 2021

[2] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech," Proc. Interspeech, 2021 (to appear)

Citation

@inproceedings{eend-vector-clustering,
 author = {Keisuke Kinoshita and Marc Delcroix and Naohiro Tawara},
 title = {Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds},
 booktitle = {{ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}},
 pages={7198-7202}
 year = {2021}
}

Install tools

Requirements

NVIDIA CUDA GPU
CUDA Toolkit (version == 9.2, 10.1 or 10.2)

Install kaldi and python environment

cd tools
make

This command builds kaldi at tools/kaldi
- if you want to use pre-build kaldi
```
cd tools
make KALDI=<existing_kaldi_root>
```
  This option make a symlink at tools/kaldi
This command extracts miniconda3 at tools/miniconda3, and creates conda envirionment named 'eend'
Then, installs Pytorch and Padertorch into 'eend' environment
- use CUDA in /usr/local/cuda/
  - if you need to specify your CUDA path
```
cd tools
make CUDA_PATH=/your/path/to/cuda-10.1
```
    The pytorch install command to be executed is depended on your CUDA version. See https://pytorch.org/get-started/previous-versions/
Then, clones EEND to reference symbolic links stored under eend/, egs/ and utils/

Test recipe (mini_librispeech)

Configuration

Modify egs/mini_librispeech/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/mini_librispeech/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh

See RESULT.md and compare with your result.

CALLHOME experiment

Configuraition

Modify egs/callhome/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/callhome/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh --db_path <db_path>
# <db_path> means absolute path of the directory where the necessary LDC corpora are stored.

See RESULT.md and compare with your result.
If you want to run multi-GPU training, simply set CUDA_VISIBLE_DEVICES appropriately. This environment variable may be automatically set by your job schedular such as SLURM.

This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

Related tags

Overview

EEND-vector clustering

References

Citation

Install tools

Requirements

Install kaldi and python environment

Test recipe (mini_librispeech)

Configuration

Run data preparation, training, inference, and scoring

CALLHOME experiment

Configuraition

Run data preparation, training, inference, and scoring

Owner

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

Reference implementation for Structured Prediction with Deep Value Networks

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

This is an example of a reproducible modelling project

tinykernel - A minimal Python kernel so you can run Python in your Python

Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval.

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Deep Q-learning for playing chrome dino game

Implements MLP-Mixer: An all-MLP Architecture for Vision.

Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

Implementation of Barlow Twins paper

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

Implementation of TimeSformer, a pure attention-based solution for video classification

University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave