The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Last update: Dec 02, 2022

Related tags

Deep Learning SF-Net

Overview

SF-Net for fullband SE

This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement", which is submitted to Interspecch 2022. Some audio samples are provided here and the code for GCRN-full, DS-Net-full, CTS-Net-full and the network configuration of SF-Net are released.

Abstract：Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Related tags

Overview

SF-Net for fullband SE

Demo page of audio samples

System flowchart of SF-Net

Results:

Abaltion study

Comparison with SOTA

Visualization of spectrograms

VB dataset

DNS blind set

Owner

Guochen Yu

Contrastive Learning with Non-Semantic Negatives

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Using Hotel Data to predict High Value And Potential VIP Guests

automatic color-grading

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Cmsc11 arcade - Final Project for CMSC11

Paper Code：A Self-adaptive Weighted Differential Evolution Approach for Large-scale Feature Selection

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

Repo for parser tensorflow(.pb) and tflite(.tflite)

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

Real time sign language recognition

Implementation for Shape from Polarization for Complex Scenes in the Wild

CATE: Computation-aware Neural Architecture Encoding with Transformers

Tracking Pipeline helps you to solve the tracking problem more easily

The Power of Scale for Parameter-Efficient Prompt Tuning

Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM