[email protected] Reverb Database. | PythonRepo" /> [email protected] Reverb Database. | PythonRepo">

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Overview

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Noise and RIR dataset description:

  • BUT [email protected] Reverb Database:

    The database is being built with respect to collect a large number of various Room Impulse Responses, Room environmental noises (or "silences"), Retransmitted speech (for ASR and SID testing), and meta-data (positions of microphones, speakers etc.).

    The goal is to provide speech community with a dataset for data enhancement and distant microphone or microphone array experiments in ASR and SID.

    In this codebase, we only use the RIR data, which is used to synthesize far-field speech, the composition of the RIR dataset and citation details are as follows.

    Room Name Room Type Size (length, depth, height) (m) (microphone_num x loudspeaker_num)
    Q301 Office 10.7x6.9x2.6 31 x 3
    L207 Office 4.6x6.9x3.1 31 x 6
    L212 Office 7.5x4.6x3.1 31 x 5
    L227 Stairs 6.2x2.6x14.2 31 x 5
    R112 Hotel room 4.4x2.8x2.6 31 x 5
    CR2 Conference room 28.2x11.1x3.3 31 x 4
    E112 Lecture room 11.5x20.1x4.8 31 x 2
    D105 Lecture room 17.2x22.8x6.9 31 x 6
    C236 Meeting room 7.0x4.1x3.6 31 x 10
    @ARTICLE{8717722,
             author={Szöke, Igor and Skácel, Miroslav and Mošner, Ladislav and Paliesek, Jakub and Černocký, Jan},
             journal={IEEE Journal of Selected Topics in Signal Processing}, 
             title={Building and evaluation of a real room impulse response dataset}, 
             year={2019},
             volume={13},
             number={4},
             pages={863-876},
             doi={10.1109/JSTSP.2019.2917582}
     }
    
  • MUSAN database:

    The database consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises and we only use the noise data in this database. Citation details are as follows.

    @misc{snyder2015musan,
          title={MUSAN: A Music, Speech, and Noise Corpus}, 
          author={David Snyder and Guoguo Chen and Daniel Povey},
          year={2015},
          eprint={1510.08484},
          archivePrefix={arXiv},
          primaryClass={cs.SD}
    }
    

Before using the data-processing code:

  • If you do not want the original dataset to be overwritten, please download the dataset again for use

  • You need to create three files: 'training_list.txt', 'validation_list.txt', 'testing_list.txt', based on your training, validation and test data file paths respectively, and ensure the audio in the file paths can be read and written.

  • The content of the aforementioned '*_list.txt' files are in the following form:

    *_list.txt
    	/../...../*.wav
    	/../...../*.wav
    	/../...../*.wav
    

Instruction for using the following data-processing code:

  1. mix_cleanaudio_with_rir_offline.py: Generate far-field speech offline

    • two parameters are needed:

      • --data_root: the data path which you want to download and store the RIR dataset in.
      • --clean_data_list_path: the path of the folder in which 'training_list.txt', 'validation_list.txt', 'testing_list.txt' are stored in
    • 2 folders will be created in data_root: 'ReverDB_data (Removable if needed)', 'ReverDB_mix'

  2. download_and_extract_noise_file.py: Generate musan noise file

    • one parameters are needed:
      • --data_root: the data path which you want to download and store the noise dataset in.
    • 2 folder will be created in data_root: 'musan (Removable if needed)', 'noise'
  3. vad_torch.py: Voice activity detection when adding noise to the speech

    The noise data is usually added online according to the SNR requirements, several pieces of code are provided below, please add them in the appropriate places according to your needs!

    import torchaudio
    import numpy as np
    import torch
    import random
    from vad_torch import VoiceActivityDetector
    
    
    def _add_noise(speech_sig, vad_duration, noise_sig, snr):
        """add noise to the audio.
        :param speech_sig: The input audio signal (Tensor).
        :param vad_duration: The length of the human voice (int).
        :param noise_sig: The input noise signal (Tensor).
        :param snr: the SNR you want to add (int).
        :returns: noisy speech sig with specific snr.
        """
        if vad_duration != 0:
            snr = 10**(snr/10.0)
            speech_power = torch.sum(speech_sig**2)/vad_duration
            noise_power = torch.sum(noise_sig**2)/noise_sig.shape[1]
            noise_update = noise_sig / torch.sqrt(snr * noise_power/speech_power)
    
            if speech_sig.shape[1] > noise_update.shape[1]:
                # padding
                temp_wav = torch.zeros(1, speech_sig.shape[1])
                temp_wav[0, 0:noise_update.shape[1]] = noise_update
                noise_update = temp_wav
            else:
                # cutting
                noise_update = noise_update[0, 0:speech_sig.shape[1]]
    
            return noise_update + speech_sig
        
        else:
            return speech_sig
        
    def main():
        # loading speech file
        speech_file = './speech.wav'
    	waveform, sr = torchaudio.load(speech_file)
    	waveform = waveform - waveform.mean()
    	
        # loading noise file and set snr
    	snr = 0       
    	noise_file = random.randint(1, 930)
    	
        # Voice activity detection
    	v = VoiceActivityDetector(waveform, sr)
    	raw_detection = v.detect_speech()
    	speech_labels = v.convert_windows_to_readible_labels(raw_detection)
    	vad_duration = 0
        if not len(speech_labels) == 0:
            for i in range(len(speech_labels)):
                start = speech_labels[i]['speech_begin']
                end = speech_labels[i]['speech_end']
                vad_duration = vad_duration + end-start
                
    	# adding noise
        noise, _ = torchaudio.load('/notebooks/noise/' + str(noise_file) + '.wav')
        waveform = _add_noise(waveform, vad_duration, noise, snr)
    
    if __name__ == '__main__':
        main()
Owner
Yunqi Chen
3rd-year undergraduate student; Passionate about all kinds of sports and everything interesting!
Yunqi Chen
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Dec 30, 2022
Taking the fight to the establishment.

Throwdown Taking the fight to the establishment. Wat? I wanted a simple markdown interpreter in python and/or javascript to output html for my website

Trevor van Hoof 1 Feb 01, 2022
Automated Birthday Wisher built using Python

Automated Birthday Wisher This Automation of wishing Birthday is achieved using Python. Never forget to wish birthday! Table of contents Overview Scre

yashviradia 1 Nov 29, 2021
Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Koen Van Impe 20 Dec 16, 2022
Submission to the HEAR2021 Challenge

Submission to the HEAR 2021 Challenge For model evaluation, python=3.8 and cuda10.2 with cudnn7.6.5 have been tested. The work uses a mixed supervised

Heinrich Dinkel 10 Dec 08, 2022
An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

Brian Peiris 12 Nov 02, 2022
Got-book-6 - LSTM trained on the first five ASOIAF/GOT books

GOT Book 6 Generator Are you tired of waiting for the next GOT book to come out? I know that I am, which is why I decided to train a RNN on the first

Zack Thoutt 974 Oct 27, 2022
Kolibri: the offline app for universal education

Kolibri This repository is for software developers wishing to contribute to Kolibri. If you are looking for help installing, configuring and using Kol

Learning Equality 564 Jan 02, 2023
Find out where all films you want to watch are streaming

Just Watch Letterboxd Find out where all films you want to watch are streaming Ever wonder what films you want to watch are already on the streaming p

Jordan Oslislo 2 Feb 04, 2022
Open source stenotype engine

Plover Bringing stenography to everyone. Homepage Releases Wiki Blog Google Group Discord Chat About Installation Getting help Contributing Donations

Open Steno Project 2k Jan 09, 2023
Get a link to the web version of a git-tracked file or directory

githyperlink Get a link to the web version of a git-tracked file or directory. Applies to GitHub and GitLab remotes (and maybe others but those are no

Tomas Fiers 2 Nov 08, 2022
Just another sentiment wrapper.

sentimany Just a simple sentiment tool. It just grabs a set of pre-made sentiment models that you can quickly use to attach sentiment scores to text.

vincent d warmerdam 15 Dec 27, 2022
Library for Memory Trace Statistics in Python

Memory Search Library for Memory Trace Statistics in Python The library uses tracemalloc as a core module, which is why it is only available for Pytho

Memory Search 1 Dec 20, 2021
An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.

Awesome AI for Art & Design An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to a

Margaret Maynard-Reid 20 Dec 21, 2022
Extra scripts to improve user experience related to OpenTaiko

OpenTaiko-Utils Extra scripts to improve user experience related to OpenTaiko osu2tja /!\ IMPORTANT NOTE /!\ Converted charts that aren't yours are fo

2 Dec 25, 2022
A Python program that generates a maze that solves itself using DFS

Maze Generator And Solver Program Purpose: Generates a maze that then solves itself Language: Python and Pygame Algorithm: Randomized DFS / Floodfill

Joshua Liu 1 Jul 25, 2022
A tool converting rpk (记乎) to apkg (Anki Package)

RpkConverter This tool is used to convert rpk file to Anki apkg. 如果遇到任何问题,请发起issue,并描述情况。如果转换rpk出现问题,请将文件发到邮箱 ssqyang [AT] outlook.com,我会debug并修复问题。 下

9 Nov 01, 2021
Mail Me My Social Media stats (SoMeMailMe)

Mail Me My Social Media follower count (SoMeMailMe) TikTok only show data 60 days back in time. With this repo you can easily scrape your follower cou

Daniel Wigh 1 Jan 07, 2022
Adversarial Robustness with Non-uniform Perturbations

Adversarial Robustness with Non-uniform Perturbations This repository hosts the code to replicate experiments of the paper Adversarial Robustness with

5 May 20, 2022
CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。

CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。 基本功能 置顶功能 是否使窗体一直保持在最上面。 简洁模式 简洁模式使窗体更加简洁。 此模式下不可调整大小,请提前在普通模式下调整大小。 设置功能 修改主窗体背景颜色,修改计时模式。 透明设置 调整窗体的透明度。 修改

gaoyongxian 130 Dec 01, 2022