Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

  1. Download the overlapped speech of LibriCSS dataset.

    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip
  2. Download the Conformer separation models.

    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip
  3. Run the separation.

    3.1 Single-channel separation

    export MODEL_NAME=1ch_conformer_base
    python3 separate.py \
        --checkpoint checkpoints/$MODEL_NAME \
        --mix-scp utils/overlapped_speech_1ch.scp \
        --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
        --device-id 0 \
        --num_spks 2

    The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

    3.2 Seven-channel separation

    export MODEL_NAME=conformer_base
    python3 separate.py \
        --checkpoint checkpoints/$MODEL_NAME \
        --mix-scp utils/overlapped_speech_7ch.scp \
        --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
        --device-id 0 \
        --num_spks 2 \
        --mvdr True

    The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}
Owner
Sanyuan Chen (陈三元)
Sanyuan Chen (陈三元)
FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel ga

Tarun K 280 Dec 23, 2022
Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

Facebook Research 90 Dec 28, 2022
Plover-tapey-tape: an alternative to Plover’s built-in paper tape

plover-tapey-tape plover-tapey-tape is an alternative to Plover’s built-in paper

7 May 29, 2022
Self-Supervised Learning

Self-Supervised Learning Features self_supervised offers features like modular framework support for multi-gpu training using PyTorch Lightning easy t

Robin 1 Dec 14, 2021
YOLOv4-v3 Training Automation API for Linux

This repository allows you to get started with training a state-of-the-art Deep Learning model with little to no configuration needed! You provide your labeled dataset or label your dataset using our

BMW TechOffice MUNICH 626 Dec 31, 2022
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack DanceTrack is a benchmark for tracking multiple objects in uniform appearance and diverse motion. DanceTrack provides box and identity anno

260 Dec 28, 2022
An Open-Source Toolkit for Prompt-Learning.

An Open-Source Framework for Prompt-learning. Overview • Installation • How To Use • Docs • Paper • Citation • What's New? Nov 2021: Now we have relea

THUNLP 2.3k Jan 07, 2023
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

Google Research 876 Dec 17, 2022
Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

SIGGRAPH 2021: Discovering Diverse Athletic Jumping Strategies project page paper demo video Prerequisites Important Notes We suspect there are bugs i

54 Dec 06, 2022
Corruption Invariant Learning for Re-identification

Corruption Invariant Learning for Re-identification The official repository for Benchmarks for Corruption Invariant Person Re-identification (NeurIPS

Minghui Chen 73 Dec 08, 2022
A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions

Overview This is a hobby project which includes a hand-gesture controlled virtual piano using an android phone camera and some OpenCV library. My moti

Abhinav Gupta 1 Nov 19, 2021
[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Trans-Net Code for (Visual Boundary Knowledge Translation for Foreground Segmentation, AAAI2021). [https://ojs.aaai.org/index.php/AAAI/article/view/16

ZJU-VIPA 2 Mar 04, 2022
Parasite: a tool allowing you to compress and decompress files, to reduce their size

🦠 Parasite 🦠 Parasite is a tool written in Python3 allowing you to "compress" any file, reducing its size. ⭐ Features ⭐ + Fast + Good optimization,

Billy 30 Nov 25, 2022
Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

Yijun Li 642 Dec 27, 2022
Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

Johannes von Lindheim 3 Oct 29, 2022
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

zhangtao 146 Dec 29, 2022
Manifold Alignment for Semantically Aligned Style Transfer

Manifold Alignment for Semantically Aligned Style Transfer [Paper] Getting Started MAST has been tested on CentOS 7.6 with python = 3.6. It supports

35 Nov 14, 2022
Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022
This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting Project Page | YouTube | Paper This is the official PyTorch implementation of the C

Zhuoqian Yang 330 Dec 11, 2022