Discriminative Condition-Aware PLDA

Related tags

Deep LearningDCA-PLDA
Overview

DCA-PLDA

This repository implements the Discriminative Condition-Aware Backend described in the paper:

L. Ferrer, M. McLaren, and N. Brümmer, "A Speaker Verification Backend with Robust Performance across Conditions", in Computer Speech and Language, volume 71, 2021

This backend has the same functional form as the usual probabilistic discriminant analysis (PLDA) backend which is commonly used for speaker verification, including the preprocessing stages. It also integrates the calibration stage as part of the backend, where the calibration parameters depend on an estimated condition for the signal. The condition is internally represented by a very low dimensional vector. See the paper for more details on the mathematical formulation of the backend.

We have found this system to provide great out-of-the-box performance across a very wide range of conditions, when training the backend with a variety of data including Voxceleb, SRE (from the NIST speaker recognition evaluations), Switchboard, Mixer 6, RATS and FVC Australian datasets, as described in the above paper.

The code can also be used to train and evaluate a standard PLDA pipeline. Basically, the initial model before any training epochs is identical to a PLDA system, with an option for weighting the samples during training to compensate for imbalance across training domains.

Further, the current version of the code can also be used to do language detection. In this case, we have not yet explored the use of condition-awereness, but rather focused on a novel hierachical approach, which is described in the following paper:

L. Ferrer, D. Castan, M. McLaren, and A. Lawson, "A Hierarchical Model for Spoken Language Recognition", arXiv:2201.01364, 2021

Example scripts and configuration files to do both speaker verification and language detection are provided in the examples directory.

This code was written by Luciana Ferrer. We thank Niko Brummer for his help with the calibration code in the calibration.py file and for providing the code to do heavy-tail PLDA. The pre-computed embeddings provided to run the example were computed using SRI's software and infrastructure.

We will appreciate any feedback about the code or the approaches. Also, please let us know if you find bugs.

How to install

  1. Clone this repository:

    git clone https://github.com/luferrer/DCA-PLDA.git

  2. Install the requirements:

    pip install -r requirements.txt

  3. If you want to run the example code, download the pre-computed embeddings for the task you want to run from:

    https://sftp.speech.sri.com/forms/DCA-DPLDA

    Untar the file and move (or link) the resulting data/ dir inside the example dir for the task you want to run.

  4. You can then run the run_all script which runs several experiments using different configuration files and training sets. You can edit it to just try a single configuration, if you want. Please, see the top of that script for an explanation on what is run and where the output results end up. The run_all scripts will take a few hours to run (on a GPU) if all configurations are run. A RESULTS file is also provided for comparison. The run_all script should generate similar numbers to those in that file if all goes well.

About the examples

The example dir contains two example recipes, one for speaker verification and one for language detection.

Speaker Verification

The example provided with the repository includes the Voxceleb and FVC Australian subsets of the training data used in the paper, since the other datasets are not freely available. As such, the resulting system will only work well on conditions similar to those present in that data. For this reason, we test the resulting model on SITW and Voxceleb2 test dataset, which are very similar in nature to the Voxceleb data used for training. We also test on a set of FVC speakers which are held-out from training.

Language Detection

The example uses the Voxlingua107 dataset which contains a large number of languages.

How to change the examples to use your own data and embeddings

The example scripts run using embeddings for each task extracted at SRI International using standard x-vector architectures. See the papers cited above for a description of the characteristics of the corresponding embedding extractors. Unfortunately, we are unable to release the embedding extractors, but you should be able to replace these embeddings with any type of speaker or language embeddings (eg, those that can be extracted with Kaldi).

The audio files corresponding to the databases used in the speaker verification example above can be obtained for free:

For the language detection example, the Voxlingua107 audio samples can be obtained from http://bark.phon.ioc.ee/voxlingua107/.

Once you have extracted embeddings for all that data using your own procedure, you can set up all the lists and embeddings in the same way and with the same format (hdf5 or npz in the case of embeddings) as in the example data dir for your task of interest and use the run_all script.

Note on scoring multi-sample enrollment models

For now, for speaker verification, the DCA-PLDA model only knows how to calibrate trials that are given by a comparison of two individual speech waveforms since that is the way we create trials during training. The code in this repo can still score trials with multi-file enrollment models, but it does it in a hacky way. Basically, it scores each enrollment sample against the test sample for the trial and then averages the scores. This works reasonably well but it is not ideal. A generalization to scoring multi-sample enrollment trials within the model is left as future work.

Owner
Luciana Ferrer
Luciana Ferrer
Principled Detection of Out-of-Distribution Examples in Neural Networks

ODIN: Out-of-Distribution Detector for Neural Networks This is a PyTorch implementation for detecting out-of-distribution examples in neural networks.

189 Nov 29, 2022
Analyses of the individual electric field magnitudes with Roast.

Aloi Davide - PhD Student (UoB) Analysis of electric field magnitudes (wp2a dataset only at the moment) and correlation analysis with Dynamic Causal M

Davide Aloi 7 Dec 15, 2022
Development of IP code based on VIPs and AADM

Sparse Implicit Processes In this repository we include the two different versions of the SIP code developed for the article Sparse Implicit Processes

1 Aug 22, 2022
StellarGraph - Machine Learning on Graphs

StellarGraph Machine Learning Library StellarGraph is a Python library for machine learning on graphs and networks. Table of Contents Introduction Get

S T E L L A R 2.6k Jan 05, 2023
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack Case study of the FCA. The code can be find in FCA. Cas

IDRL 21 Dec 15, 2022
A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

DeepMind 44 Nov 20, 2022
VLGrammar: Grounded Grammar Induction of Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong 27 Dec 23, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022
Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022
Implementation of Bottleneck Transformer in Pytorch

Bottleneck Transformer - Pytorch Implementation of Bottleneck Transformer, SotA visual recognition model with convolution + attention that outperforms

Phil Wang 621 Jan 06, 2023
MINOS: Multimodal Indoor Simulator

MINOS Simulator MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environ

194 Dec 27, 2022
Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Learning Intents behind Interactions with Knowledge Graph for Recommendation This is our PyTorch implementation for the paper: Xiang Wang, Tinglin Hua

158 Dec 15, 2022
The authors' implementation of Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations This is the authors' implementation of Unsupervised Adversarial Learning of

Dwango Media Village 140 Dec 07, 2022
Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Obstacle Tower Challenge using Deep Reinforcement Learning Unity Obstacle Tower is a challenging realistic 3D, third person perspective and procedural

Zhuoyu Feng 5 Feb 10, 2022
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL: Graph Contrastive Learning for PyTorch PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL com

GCL: Graph Contrastive Learning Library for PyTorch 594 Jan 08, 2023
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
BuildingNet: Learning to Label 3D Buildings

BuildingNet This is the implementation of the BuildingNet architecture described in this paper: Paper: BuildingNet: Learning to Label 3D Buildings Arx

16 Nov 07, 2022
Adaptive Denoising Training (ADT) for Recommendation.

DenoisingRec Adaptive Denoising Training for Recommendation. This is the pytorch implementation of our paper at WSDM 2021: Denoising Implicit Feedback

Wenjie Wang 51 Dec 30, 2022