[email protected] Reverb Database. | PythonRepo" /> [email protected] Reverb Database. | PythonRepo">

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Overview

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Noise and RIR dataset description:

  • BUT [email protected] Reverb Database:

    The database is being built with respect to collect a large number of various Room Impulse Responses, Room environmental noises (or "silences"), Retransmitted speech (for ASR and SID testing), and meta-data (positions of microphones, speakers etc.).

    The goal is to provide speech community with a dataset for data enhancement and distant microphone or microphone array experiments in ASR and SID.

    In this codebase, we only use the RIR data, which is used to synthesize far-field speech, the composition of the RIR dataset and citation details are as follows.

    Room Name Room Type Size (length, depth, height) (m) (microphone_num x loudspeaker_num)
    Q301 Office 10.7x6.9x2.6 31 x 3
    L207 Office 4.6x6.9x3.1 31 x 6
    L212 Office 7.5x4.6x3.1 31 x 5
    L227 Stairs 6.2x2.6x14.2 31 x 5
    R112 Hotel room 4.4x2.8x2.6 31 x 5
    CR2 Conference room 28.2x11.1x3.3 31 x 4
    E112 Lecture room 11.5x20.1x4.8 31 x 2
    D105 Lecture room 17.2x22.8x6.9 31 x 6
    C236 Meeting room 7.0x4.1x3.6 31 x 10
    @ARTICLE{8717722,
             author={Szöke, Igor and Skácel, Miroslav and Mošner, Ladislav and Paliesek, Jakub and Černocký, Jan},
             journal={IEEE Journal of Selected Topics in Signal Processing}, 
             title={Building and evaluation of a real room impulse response dataset}, 
             year={2019},
             volume={13},
             number={4},
             pages={863-876},
             doi={10.1109/JSTSP.2019.2917582}
     }
    
  • MUSAN database:

    The database consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises and we only use the noise data in this database. Citation details are as follows.

    @misc{snyder2015musan,
          title={MUSAN: A Music, Speech, and Noise Corpus}, 
          author={David Snyder and Guoguo Chen and Daniel Povey},
          year={2015},
          eprint={1510.08484},
          archivePrefix={arXiv},
          primaryClass={cs.SD}
    }
    

Before using the data-processing code:

  • If you do not want the original dataset to be overwritten, please download the dataset again for use

  • You need to create three files: 'training_list.txt', 'validation_list.txt', 'testing_list.txt', based on your training, validation and test data file paths respectively, and ensure the audio in the file paths can be read and written.

  • The content of the aforementioned '*_list.txt' files are in the following form:

    *_list.txt
    	/../...../*.wav
    	/../...../*.wav
    	/../...../*.wav
    

Instruction for using the following data-processing code:

  1. mix_cleanaudio_with_rir_offline.py: Generate far-field speech offline

    • two parameters are needed:

      • --data_root: the data path which you want to download and store the RIR dataset in.
      • --clean_data_list_path: the path of the folder in which 'training_list.txt', 'validation_list.txt', 'testing_list.txt' are stored in
    • 2 folders will be created in data_root: 'ReverDB_data (Removable if needed)', 'ReverDB_mix'

  2. download_and_extract_noise_file.py: Generate musan noise file

    • one parameters are needed:
      • --data_root: the data path which you want to download and store the noise dataset in.
    • 2 folder will be created in data_root: 'musan (Removable if needed)', 'noise'
  3. vad_torch.py: Voice activity detection when adding noise to the speech

    The noise data is usually added online according to the SNR requirements, several pieces of code are provided below, please add them in the appropriate places according to your needs!

    import torchaudio
    import numpy as np
    import torch
    import random
    from vad_torch import VoiceActivityDetector
    
    
    def _add_noise(speech_sig, vad_duration, noise_sig, snr):
        """add noise to the audio.
        :param speech_sig: The input audio signal (Tensor).
        :param vad_duration: The length of the human voice (int).
        :param noise_sig: The input noise signal (Tensor).
        :param snr: the SNR you want to add (int).
        :returns: noisy speech sig with specific snr.
        """
        if vad_duration != 0:
            snr = 10**(snr/10.0)
            speech_power = torch.sum(speech_sig**2)/vad_duration
            noise_power = torch.sum(noise_sig**2)/noise_sig.shape[1]
            noise_update = noise_sig / torch.sqrt(snr * noise_power/speech_power)
    
            if speech_sig.shape[1] > noise_update.shape[1]:
                # padding
                temp_wav = torch.zeros(1, speech_sig.shape[1])
                temp_wav[0, 0:noise_update.shape[1]] = noise_update
                noise_update = temp_wav
            else:
                # cutting
                noise_update = noise_update[0, 0:speech_sig.shape[1]]
    
            return noise_update + speech_sig
        
        else:
            return speech_sig
        
    def main():
        # loading speech file
        speech_file = './speech.wav'
    	waveform, sr = torchaudio.load(speech_file)
    	waveform = waveform - waveform.mean()
    	
        # loading noise file and set snr
    	snr = 0       
    	noise_file = random.randint(1, 930)
    	
        # Voice activity detection
    	v = VoiceActivityDetector(waveform, sr)
    	raw_detection = v.detect_speech()
    	speech_labels = v.convert_windows_to_readible_labels(raw_detection)
    	vad_duration = 0
        if not len(speech_labels) == 0:
            for i in range(len(speech_labels)):
                start = speech_labels[i]['speech_begin']
                end = speech_labels[i]['speech_end']
                vad_duration = vad_duration + end-start
                
    	# adding noise
        noise, _ = torchaudio.load('/notebooks/noise/' + str(noise_file) + '.wav')
        waveform = _add_noise(waveform, vad_duration, noise, snr)
    
    if __name__ == '__main__':
        main()
Owner
Yunqi Chen
3rd-year undergraduate student; Passionate about all kinds of sports and everything interesting!
Yunqi Chen
Contains the code of my learning of Python OOP.

OOP Python This repository contains the code of my learning of Python OOP. All the code: is following PEP 8 ✅ has proper concept illustrations and com

Samyak Jain 2 Jan 15, 2022
Academic planner application designed for students and counselors.

Academic planner application designed for students and counselors.

Ali bagheri 2 Dec 31, 2021
Python with braces. Because Python is awesome, but whitespace is awful.

Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in

1 Nov 04, 2021
ABT aka Animated Background Tool is a windows only python program that makes it that you can have animated background.

ABT ABT aka Animated Background Tool is a windows only python program that makes it that you can have animated background. 𝓡𝓔𝓐𝓓 𝓜𝓔, An Important

Yeeterboi4 2 Jul 16, 2022
Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like.

Advent Of Code 2021 - Python English Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels th

Coral Izquierdo Muñiz 2 Jan 09, 2022
Retrieve bank transactions and categorize for budgeting use

Budgeting After trying out some budgeting software, I decided to make my own. selenium_scraper Using the selenium package, this script runs an instanc

Marc 1 Nov 10, 2021
An Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes ⌛ Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python a

FSP Gang s' YT 2 Oct 18, 2021
A C-like hardware description language (HDL) adding high level synthesis(HLS)-like automatic pipelining as a language construct/compiler feature.

██████╗ ██╗██████╗ ███████╗██╗ ██╗███╗ ██╗███████╗ ██████╗ ██╔══██╗██║██╔══██╗██╔════╝██║ ██║████╗ ██║██╔════╝██╔════╝ ██████╔╝██║██████╔╝█

Julian Kemmerer 391 Jan 01, 2023
Dashboard to view a stock's basic information, RSI, Bollinger bands, EMA, SMA, sentiment analysis via Python

Your One And Only Trading Bot No seriously, we mean it! Contributors Jihad Al-Hussain John Gaffney Shanel Kuchera Kazuki Takehashi Patrick Thornquist

5 May 21, 2022
Similarity checking of sign languages

Similarity checking of sign languages This repository checks for similarity betw

Tonni Das Jui 1 May 13, 2022
Simple tooling for marking deprecated functions or classes and re-routing to the new successors' instance.

pyDeprecate Simple tooling for marking deprecated functions or classes and re-routing to the new successors' instance

Jirka Borovec 45 Nov 24, 2022
An async API wrapper for Dress To Impress written in Python.

dti.py An async API wrapper for Dress To Impress written in Python. Some notes: For the time being, there are no front-facing docs for this beyond doc

Steve C 1 Dec 14, 2022
calculadora financiera hecha en python

Calculadora financiera Calculadora de factores financieros basicos, puede calcular tanto factores como expresiones algebraicas en funcion de dichos fa

crudo 5 Nov 10, 2021
A OBS service to package a published repository into a tar.gz file

OBS Source Service obs-service-publish_tar obs-service-publish_tar will create a archive.tar[.tar compression] archive containing the published repo

Erico Mendonca 1 Feb 16, 2022
200 LeetCode problems

LeetCode I classify 200 leetcode problems into some categories and upload my code to who concern WEEK 1 # Title Difficulty Array 15 3Sum Medium 1324 P

Hoang Cao Bao 108 Dec 08, 2022
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022
🏆 A ranked list of awesome Python open-source libraries and tools. Updated weekly.

Best-of Python 🏆 A ranked list of awesome Python open-source libraries & tools. Updated weekly. This curated list contains 230 awesome open-source pr

Machine Learning Tooling 2.7k Jan 03, 2023
easy_sbatch - Batch submitting Slurm jobs with script templates

easy_sbatch - Batch submitting Slurm jobs with script templates

Wei Shen 13 Oct 11, 2022
DC619/DC858 Mainframe Environment/Lab

DC619 Training LPAR The file DC619 - Mainframe Overflows Hands On.pdf contains the labs and walks through how to perform them. Use docker You can use

Soldier of FORTRAN 9 Jun 27, 2022
Repository, with small useful and functional applications

Repositorio,com pequenos aplicativos uteis e funcionais A ideia e ir deselvolvendo pequenos aplicativos funcionais e adicionar a este repositorio List

GabrielDuke 6 Dec 06, 2021