Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ckpt		ckpt
datasets		datasets
lib		lib
logger		logger
losses		losses
models		models
output_sample		output_sample
README.md		README.md
inference.sh		inference.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ckpt

ckpt

datasets

datasets

lib

lib

logger

logger

losses

losses

models

models

output_sample

output_sample

README.md

README.md

inference.sh

inference.sh

train.py

train.py

train.sh

train.sh

Repository files navigation

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

About

Releases

Packages

Languages

vuthede/speech_separation_PIT

Folders and files

Latest commit

History

Repository files navigation

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

About

Resources

Stars

Watchers

Forks

Languages