The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Last update: Oct 30, 2022

Related tags

Text Data & NLP speech_separation_PIT

Overview

Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Related tags

Overview

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

Owner

vuthede

A framework for cleaning Chinese dialog data

The aim of this task is to predict someone's English proficiency based on a text input.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

Dust model dichotomous performance analysis

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Lumped-element impedance calculator and frequency-domain plotter.

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Search Git commits in natural language

A natural language processing model for sequential sentence classification in medical abstracts.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

A raytrace framework using taichi language

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Text Analysis & Topic Extraction on Android App user reviews

Label data using HuggingFace's transformers and automatically get a prediction service

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?