This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Last update: Sep 15, 2022

Related tags

Deep Learning CPC_DeepCluster

Overview

CPC_DeepCluster

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

setup instructions

Clone the repo: https://github.com/iiscleap/CPC_DeepCluster.git
Install libraries which would be required for torch-audio https://github.com/pytorch/audio :

Linux: sudo apt-get install sox libsox-dev libsox-fmt-all

conda env create -f environment.yml && conda activate cpc37
Run setup.py python setup.py develop

Using the Repository

To start the training :

python cpc/train_mod.py --pathDB $PATH_AUDIO_FILES --pathCheckpoint $PATH_CHECKPOINT_DIR --LabelsPath $Path_Pseudo_Labels --file_extension $EXTENSION --normMode batchNormn--rnnMode linear --nLevelsGRU 2 --max_size_loaded 1000000000 --save_step 1 --alpha_val $Cluster_Loss_Weighting

Where:

$PATH_AUDIO_FILES is the directory containing the audio files. The files should be arranged as below:

PATH_AUDIO_FILES
│
└───speaker1
│   └───...
│         │   seq_11.{$EXTENSION}
│         │   seq_12.{$EXTENSION}
│         │   ...
│
└───speaker2
    └───...
          │   seq_21.{$EXTENSION}
          │   seq_22.{$EXTENSION}

$PATH_CHECKPOINT_DIR in the directory where the checkpoints will be saved
$EXTENSION is the extension of each audio file
$Path_Pseudo_Labels is the directory that contains the psuedo labels of all the audio files in $PATH_AUDIO_FILES
$Cluster_Loss_Weighting provides the weighting factor for the cluster loss.

Restarting the session

To restart a session from the last save checkpoint run

python cpc/train_mod.py --pathCheckpoint $PATH_CHECKPOINT_DIR

Generating the pseudo labels for training

Create quantized.txt using the repository here

python create_pseudolabels.py --input_file $Path_Containing_quantized.txt --out_path $Output_Dir

$Output_Dir is the directory where .pt files containing pseudo labels

Extracting features, training K Means and Language Models

Extract the features for K means clustering and train K Means clustering, Language models using the repository here

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Related tags

Overview

CPC_DeepCluster

setup instructions

Using the Repository

Restarting the session

Generating the pseudo labels for training

Extracting features, training K Means and Language Models

Owner

LEAP Lab

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

Subgraph Based Learning of Contextual Embedding

We have made you a wrapper you can't refuse

Neural implicit reconstruction experiments for the Vector Neuron paper

This library contains a Tensorflow implementation of the paper Stability Analysis of Unfolded WMMSE for Power Allocation

General neural ODE and DAE modules for power system dynamic modeling.

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Food recognition model using convolutional neural network & computer vision

This repo generates the training data and the model for Morpheus-Deblend

Neural network-based build time estimation for additive manufacturing

A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

This is the pytorch re-implementation of the IterNorm

CodeContests is a competitive programming dataset for machine-learning

Mapping Conditional Distributions for Domain Adaptation Under Generalized Target Shift

Progressive Coordinate Transforms for Monocular 3D Object Detection

Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

PyTorch implementation of hand mesh reconstruction described in CMR and MobRecon.

Reference implementation for Structured Prediction with Deep Value Networks

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

LaneAF: Robust Multi-Lane Detection with Affinity Fields