MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Last update: Nov 10, 2022

Overview

MicRank: Learning to Rank Microphones for Distant Speech Recognition

Application Scenario

Many applications nowadays envision the presence of multiple heterogeneous recording devices (e.g. Microsoft Project Denmark, CHiME-5, CHiME-6 and Voices from a Distance Challenges, DIRHA project et cetera).

Audio signals captured by different microphones can be suitably combined at front-end level by using beamforming techniques. However this combination could be very challenging as in an ad-hoc microphone network microphones can be very far from each other. Moreover some could be close to noise sources or, for a particular utterance, too far from the speaker to be of any usefulness and, to further complicate things, synchronization issues may appear.

An intriguing approach could be to select only the best microphone for each utterance or instead to select only a promising subset of microphones for beamforming or ROVER combination, thus potentially saving resources and/or improving results by excluding "bad" channels. This can be performed by suitable automatic Channel Selection or Channel Ranking algorithms.

What is MicRank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels based on ASR-backend performance or any other metric/back-end task (e.g. STOI if one wishes to rank microphones based on speech intelligibility et cetera).

It is agnostic with respect to the array geometry and type of recognition back-end and it does not require sample-level synchronization between devices.

Remarkably, it is able to considerably improve over previous selection techniques, reaching comparable and in some instances better performance than oracle signal-based measures like PESQ, STOI or SDR. This is achieved with a very small model with only 266k learnable parameters, making this method much more computationally efficient than decoder or posterior based channel selection methods.

LibriAdHoc Synthetic Dataset Recipe

Coming Soon

citing MicRank

If this code has been useful, use this:

@misc{cornell2021learning,
      title={Learning to Rank Microphones for Distant Speech Recognition}, 
      author={Samuele Cornell and Alessio Brutti and Marco Matassoni and Stefano Squartini},
      year={2021},
      eprint={2104.02819},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Related tags

Overview

MicRank: Learning to Rank Microphones for Distant Speech Recognition

Application Scenario

What is MicRank

LibriAdHoc Synthetic Dataset Recipe

citing MicRank

Owner

Samuele Cornell

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine Learning

Real-Time Multi-Contact Model Predictive Control via ADMM

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

This repo tries to recognize faces in the dataset you created

Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

QueryFuzz implements a metamorphic testing approach to test Datalog engines.

The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

QuadTree Attention for Vision Transformers (ICLR2022)

ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos