[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Overview

Listening to Sounds of Silence for Speech Denoising

Introduction

This is the repository of the "Listening to Sounds of Silence for Speech Denoising" project. (Project URL: here) Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. An overview of our audio denoise network is shown here:

Silent Interval Detection Model

Our model has three components: (a) one that detects silent intervals over time, and outputs a noise profile observed from detected silent intervals; (b) another that estimates the full noise profile, and (c) yet another that cleans up the input signal.

Dependencies

  • Python 3
  • PyTorch 1.3.0

You can install the requirements either to your virtual environment or the system via pip with:

pip install -r requirements.txt

Data

Training and Testing

Our model is trained on publicly available audio datasets. We obtain clean speech signals using AVSPEECH, from which we randomly choose 2448 videos (4:5 hours of total length) and extract their speech audio channels. Among them, we use 2214 videos for training and 234 videos for testing, so the training and testing speeches are fully separate.

We use two datasets, DEMAND and Google’s AudioSet, as background noise. Both consist of environmental noise, transportation noise, music, and many other types of noises. DEMAND has been widely used in previous denoising works. Yet AudioSet is much larger and more diverse than DEMAND, thus more challenging when used as noise.

Due to the linearity of acoustic wave propagation, we can superimpose clean speech signals with noise to synthesize noisy input signals. When synthesizing a noisy input signal, we randomly choose a signal-to-noise ratio (SNR) from seven discrete values: -10dB, -7dB, -3dB, 0dB, 3dB, 7dB, and 10dB; and by mixing the foreground speech with properly scaled noise, we produce a noisy signal with the chosen SNR. For example, a -10dB SNR means that the power of noise is ten times the speech. The SNR range in our evaluations (i.e., [-10dB, 10dB]) is significantly larger than those tested in previous works.

Dataset Structure (For inference)

Please organize the dataset directory as follows:

dataset/
├── audio1.wav
├── audio2.wav
├── audio3.wav
...

Please also provide a csv file including each audio file's file_name (without extension). For example:

audio1
audio2
audio3
...

An example is provided in the data/sounds_of_silence_audioonly_original directory.

Data Preprocessing

To process the dataset, run the script:

python preprocessing/preprocessor_audioonly.py

Note: Please specify dataset's directory, csv file, and output path inside preprocessor_audioonly.py. After running the script, the dataset directory looks like the data/sounds_of_silence_audioonly directory, with a JSON file (sounds_of_silence.json in this example) linking to the directory.

Inference

Pretrained weights

You can download the pretrained weights from authors here.

Step 1

  1. Go to model_1_silent_interval_detection directory
  2. Choose the audioonly_model
  3. Run
    CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0,1 python3 predict.py --ckpt 87 --save_results false --unknown_clean_signal true
  4. Run
    python3 create_data_from_pred.py --unknown_clean_signal true
  5. Outputs can be found in the model_output directory.

Step 2

  1. Go to model_2_audio_denoising directory
  2. Choose audio_denoising_model
  3. Run
    CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 python3 predict.py --ckpt 24 --unknown_clean_signal true
  4. Outputs can be found in the model_output directory. The denoised result is called denoised_output.wav.

Command Parameters Explanation:

  1. --ckpt [number]: Refers to the pretrained model located in each models output directory (model_output/{model_name}/model/ckpt_epoch{number}.pth).
  2. --save_results [true|false]: If true, intermediate audio results and waveform figures will be saved. Recommend to leave it off to speed up the inference process.
  3. --unknown_clean_signal [true|false]: If running inference on external data (data without known clean signals), please set it to true.

Contact

E-mail: [email protected]




© 2020 The Trustees of Columbia University in the City of New York. This work may be reproduced and distributed for academic non-commercial purposes only without further authorization, but rightsholder otherwise reserves all rights.

Owner
Henry Xu
Henry Xu
Orthogonal Over-Parameterized Training

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great impo

Weiyang Liu 11 Apr 18, 2022
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Self-Supervised Learning (SimCLR) with Biological Plausible Image Augmentations Official code base for the poster "On the use of Cortical Magnificatio

Binxu 8 Aug 17, 2022
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
Pointer networks Tensorflow2

Pointer networks Tensorflow2 原文:https://arxiv.org/abs/1506.03134 仅供参考与学习,内含代码备注 环境 tensorflow==2.6.0 tqdm matplotlib numpy 《pointer networks》阅读笔记 应用场景

HUANG HAO 7 Oct 27, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

You Only Cut Once (YOCO) YOCO is a simple method/strategy of performing augmenta

88 Dec 28, 2022
A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 08, 2022
Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

SDDNet Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS

Cyril Lv 43 Nov 21, 2022
A collection of semantic image segmentation models implemented in TensorFlow

A collection of semantic image segmentation models implemented in TensorFlow. Contains data-loaders for the generic and medical benchmark datasets.

bobby 16 Dec 06, 2019
duralava is a neural network which can simulate a lava lamp in an infinite loop.

duralava duralava is a neural network which can simulate a lava lamp in an infinite loop. Example This is not a real lava lamp but a "fake" one genera

Maximilian Bachl 87 Dec 20, 2022
Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

Supporting Clustering with Contrastive Learning SCCL (NAACL 2021) Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ram

231 Jan 05, 2023
Human Pose Detection on EdgeTPU

Coral PoseNet Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for exa

google-coral 476 Dec 31, 2022
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu =20.04 Python =3.7 System dependencie

Akash James 3 May 14, 2022
Telegram chatbot created with deep learning model (LSTM) and telebot library.

Telegram chatbot Telegram chatbot created with deep learning model (LSTM) and telebot library. Description This program will allow you to create very

1 Jan 04, 2022
Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Accelerating Reinforcement Learning with Learned Skill Priors [Project Website] [Paper] Karl Pertsch1, Youngwoon Lee1, Joseph Lim1 1CLVR Lab, Universi

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 134 Dec 06, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 03, 2023
Efficient Deep Learning Systems course

Efficient Deep Learning Systems This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Sc

Max Ryabinin 173 Dec 29, 2022
一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Haoyu Xu 203 Jan 03, 2023