Evaluation and Benchmarking of Speech Super-resolution Methods

Overview

Speech Super-resolution Evaluation and Benchmarking

What this repo do:

  • A toolbox for the evaluation of speech super-resolution algorithms.
  • Unify the evaluation pipline of speech super-resolution algorithms for a easier comparison between different systems.
  • Benchmarking speech super-resolution methods (pull request is welcome). Encouraging reproducible research.

I build this repo while I'm writing my paper for INTERSPEECH 2022: Neural Vocoder is All You Need for Speech Super-resolution. The model mentioned in this paper, NVSR, will also be open-sourced here.

Installation

Install via pip:

pip3 install ssr_eval

Please make sure you have already installed sox.

Quick Example

A basic example: Evaluate on a system that do nothing:

from ssr_eval import test 
test()
  • The evaluation result json file will be stored in the ./results directory: Example file
  • The code will automatically handle stuffs like downloading test sets.
  • You will find a field "averaged" at the bottom of the json file that looks like below. This field mark the performance of the system.
"averaged": {
        "proc_fft_24000_44100": {
            "lsd": 5.152331300436993,
            "log_sispec": 5.8051057146229095,
            "sispec": 30.23394207533686,
            "ssim": 0.8484425044157442
        }
    }

Here we report four metrics:

  1. Log spectral distance(LSD).
  2. Log scale invariant spectral distance [1] (log-sispec).
  3. Scale invariant spectral distance [1] (sispec).
  4. Structral similarity (SSIM).

⚠️ LSD is the most widely used metric for super-resolution. And I include another three metrics just in case you need them.


main_idea

Below is the code of test()

from ssr_eval import SSR_Eval_Helper, BasicTestee

# You need to implement a class for the model to be evaluated.
class MyTestee(BasicTestee):
    def __init__(self) -> None:
        super().__init__()

    # You need to implement this function
    def infer(self, x):
        """A testee that do nothing

        Args:
            x (np.array): [sample,], with model_input_sr sample rate
            target (np.array): [sample,], with model_output_sr sample rate

        Returns:
            np.array: [sample,]
        """
        return x

def test():
    testee = MyTestee()
    # Initialize a evaluation helper
    helper = SSR_Eval_Helper(
        testee,
        test_name="unprocessed",  # Test name for storing the result
        input_sr=44100,  # The sampling rate of the input x in the 'infer' function
        output_sr=44100,  # The sampling rate of the output x in the 'infer' function
        evaluation_sr=48000,  # The sampling rate to calculate evaluation metrics.
        setting_fft={
            "cutoff_freq": [
                12000
            ],  # The cutoff frequency of the input x in the 'infer' function
        },
        save_processed_result=True
    )
    # Perform evaluation
    ## Use all eight speakers in the test set for evaluation (limit_test_speaker=-1) 
    ## Evaluate on 10 utterance for each speaker (limit_test_nums=10)
    helper.evaluate(limit_test_nums=10, limit_test_speaker=-1)

The code will automatically handle stuffs like downloading test sets. The evaluation result will be saved in the ./results directory.

Baselines

We provide several pretrained baselines. For example, to run the NVSR baseline, you can click the link in the following table for more details.


Table.1 Log-spectral distance (LSD) on different input sampling-rate (Evaluated on 44.1kHz).

Method One for all Params 2kHz 4kHz 8kHz 12kHz 16kHz 24kHz 32kHz AVG
NVSR [Pretrained Model] Yes 99.0M 1.04 0.98 0.91 0.85 0.79 0.70 0.60 0.84
WSRGlow(24kHz→48kHz) No 229.9M - - - - - 0.79 - -
WSRGlow(12kHz→48kHz) No 229.9M - - - 0.87 - - - -
WSRGlow(8kHz→48kHz) No 229.9M - - 0.98 - - - - -
WSRGlow(4kHz→48kHz) No 229.9M - 1.12 - - - - - -
Nu-wave(24kHz→48kHz) No 3.0M - - - - - 1.22 - -
Nu-wave(12kHz→48kHz) No 3.0M - - - 1.40 - - - -
Nu-wave(8kHz→48kHz) No 3.0M - - 1.42 - - - - -
Nu-wave(4kHz→48kHz) No 3.0M - 1.42 - - - - - -
Unprocessed - - 5.69 5.50 5.15 4.85 4.54 3.84 2.95 4.65

Click the link of the model for more details.

Here "one for all" means model can process flexible input sampling rate.

Features

The following code demonstrate the full options in the SSR_Eval_Helper:

testee = MyTestee()
helper = SSR_Eval_Helper(testee, # Your testsee object with 'infer' function implemented
                        test_name="unprocess",  # The name of this test. Used for saving the log file in the ./results directory
                        test_data_root="./your_path/vctk_test", # The directory to store the test data, which will be automatically downloaded.
                        input_sr=44100, # The sampling rate of the input x in the 'infer' function
                        output_sr=44100, # The sampling rate of the output x in the 'infer' function
                        evaluation_sr=48000, # The sampling rate to calculate evaluation metrics. 
                        save_processed_result=False, # If True, save model output in the dataset directory.
                        # (Recommend/Default) Use fourier method to simulate low-resolution effect
                        setting_fft = {
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000], # The cutoff frequency of the input x in the 'infer' function
                        }, 
                        # Use lowpass filtering to simulate low-resolution effect. All possible combinations will be evaluated. 
                        setting_lowpass_filtering = {
                            "filter":["cheby","butter","bessel","ellip"], # The type of filter 
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000], 
                            "filter_order": [3,6,9] # Filter orders
                        }, 
                        # Use subsampling method to simulate low-resolution effect
                        setting_subsampling = {
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000],
                        }, 
                        # Use mp3 compression method to simulate low-resolution effect
                        setting_mp3_compression = {
                            "low_kbps": [32, 48, 64, 96, 128],
                        },
)

helper.evaluate(limit_test_nums=10, # For each speaker, only evaluate on 10 utterances.
                limit_test_speaker=-1 # Evaluate on all the speakers. 
                )

⚠️ I recommand all the users to use fourier method (setting_fft) to simulate low-resolution effect for the convinence of comparing between different system.

Dataset Details

We build the test sets using VCTK (version 0.92), a multi-speaker English corpus that contains 110 speakers with different accents.

  • Speakers used for the test set: p360, p361, p362, p363, p364, p374, p376, s5
  • For the remaining 100 speakers, p280 and p315 are omitted for the technical issues.
  • Other 98 speakers are used for training.

Citation

If you find this repo useful for your research, please consider citing:

@misc{liu2022neural,
      title={Neural Vocoder is All You Need for Speech Super-resolution}, 
      author={Haohe Liu and Woosung Choi and Xubo Liu and Qiuqiang Kong and Qiao Tian and DeLiang Wang},
      year={2022},
      eprint={2203.14941},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Reference

[1] Liu, Haohe, et al. "VoiceFixer: Toward General Speech Restoration with Neural Vocoder." arXiv preprint arXiv:2109.13731 (2021).

Owner
Haohe Liu (刘濠赫)
Haohe Liu (刘濠赫)
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 04, 2023
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

ESRGAN (Enhanced SRGAN) [ 🚀 BasicSR] [Real-ESRGAN] ✨ New Updates. We have extended ESRGAN to Real-ESRGAN, which is a more practical algorithm for rea

Xintao 4.7k Jan 02, 2023
PyTorch implementation of TSception V2 using DEAP dataset

TSception This is the PyTorch implementation of TSception V2 using DEAP dataset in our paper: Yi Ding, Neethu Robinson, Su Zhang, Qiuhao Zeng, Cuntai

Yi Ding 27 Dec 15, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 02, 2023
An implementation of the efficient attention module.

Efficient Attention An implementation of the efficient attention module. Description Efficient attention is an attention mechanism that substantially

Shen Zhuoran 194 Dec 15, 2022
NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

NOD (Night Object Detection) Dataset NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset, BM

Igor Morawski 17 Nov 05, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
The code of Zero-shot learning for low-light image enhancement based on dual iteration

Zero-shot-dual-iter-LLE The code of Zero-shot learning for low-light image enhancement based on dual iteration. You can get the real night image tests

1 Mar 18, 2022
Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

EMI-FGSM This repository contains code to reproduce results from the paper: Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021) Xiaosen Wa

John Hopcroft Lab at HUST 10 Sep 26, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus `Chen 2 Jun 09, 2022
This code is an implementation for Singing TTS.

MLP Singer This code is an implementation for Singing TTS. The algorithm is based on the following papers: Tae, J., Kim, H., & Lee, Y. (2021). MLP Sin

Heejo You 22 Dec 23, 2022
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Curated list of awesome GAN applications and demo

gans-awesome-applications Curated list of awesome GAN applications and demonstrations. Note: General GAN papers targeting simple image generation such

Minchul Shin 4.5k Jan 07, 2023
Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed

fast-Bart Reduction of BART model size by 3X, and boost in inference speed up to 3X BART implementation of the fastT5 library (https://github.com/Ki6a

Siddharth Sharma 19 Dec 09, 2022
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
PyTorch-centric library for evaluating and enhancing the robustness of AI technologies

Responsible AI Toolbox A library that provides high-quality, PyTorch-centric tools for evaluating and enhancing both the robustness and the explainabi

24 Dec 22, 2022
Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

ARM-Net Dependencies Python 3.6 Pytorch 1.7 Results Train Data preprocessing cd data_scripts python extract_subimages_test.py python data_augmentation

Bohong Chen 55 Nov 24, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

NeurMips: Neural Mixture of Planar Experts for View Synthesis This is the official repo for PyTorch implementation of paper "NeurMips: Neural Mixture

James Lin 101 Dec 13, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 04, 2020