The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Last update: Oct 30, 2022

Related tags

Text Data & NLP speech_separation_PIT

Overview

Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Related tags

Overview

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

Owner

vuthede

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Semi-automated vocabulary generation from semantic vector models

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

DeLighT: Very Deep and Light-Weight Transformers

Automated question generation and question answering from Turkish texts using text-to-text transformers

Score-Based Point Cloud Denoising (ICCV'21)

Tools and data for measuring the popularity & growth of various programming languages.

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

PUA Programming Language written in Python.

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Resources for "Natural Language Processing" Coursera course.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Fast, general, and tested differentiable structured prediction in PyTorch

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Collection of useful (to me) python scripts for interacting with napari