SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Last update: May 20, 2022

Related tags

Deep Learning speechnas

Overview

speechnas

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification

ASRU 2021 IEEE Automatic Speech Recognition and Understanding

If this repository is useful to you, please cite our work properly. Thank you!

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification, ASRU 2021.

Environment

Set up the environment for the reposity by

PyTorch 1.7+

Check configuration

Check configuration in ./config/

inference

bash metric/metric_eer/auto_run.sh

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Related tags

Overview

speechnas

Environment

Check configuration

inference

Owner

Wentao Zhu

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Automatic library of congress classification, using word embeddings from book titles and synopses.

Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

A PyTorch Implementation of FaceBoxes

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

Adaptive Denoising Training (ADT) for Recommendation.

Unsupervised Image Generation with Infinite Generative Adversarial Networks

Pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

LERP : Label-dependent and event-guided interpretable disease risk prediction using EHRs

MoveNetを用いたPythonでの姿勢推定のデモ

A repository for generating stylized talking 3D and 3D face

A self-supervised 3D representation learning framework named viewpoint bottleneck.

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)