SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Last update: May 20, 2022

Related tags

Deep Learning speechnas

Overview

speechnas

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification

ASRU 2021 IEEE Automatic Speech Recognition and Understanding

If this repository is useful to you, please cite our work properly. Thank you!

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification, ASRU 2021.

Environment

Set up the environment for the reposity by

PyTorch 1.7+

Check configuration

Check configuration in ./config/

inference

bash metric/metric_eer/auto_run.sh

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Related tags

Overview

speechnas

Environment

Check configuration

inference

Owner

Wentao Zhu

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

PyTorch implementation of federated learning framework based on the acceleration of global momentum

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

Tiny Object Detection in Aerial Images.

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

Age Progression/Regression by Conditional Adversarial Autoencoder

Python implementation of "Elliptic Fourier Features of a Closed Contour"

SberSwap Video Swap base on deep learning

Implementation of Basic Machine Learning Algorithms on small datasets using Scikit Learn.

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

An efficient framework for reinforcement learning.

A Python library that provides a simplified alternative to DBAPI 2

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Pytorch implementation of PCT: Point Cloud Transformer

Official implementation of Deep Burst Super-Resolution

Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].