This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Last update: Sep 15, 2022

Related tags

Deep Learning CPC_DeepCluster

Overview

CPC_DeepCluster

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

setup instructions

Clone the repo: https://github.com/iiscleap/CPC_DeepCluster.git
Install libraries which would be required for torch-audio https://github.com/pytorch/audio :

Linux: sudo apt-get install sox libsox-dev libsox-fmt-all

conda env create -f environment.yml && conda activate cpc37
Run setup.py python setup.py develop

Using the Repository

To start the training :

python cpc/train_mod.py --pathDB $PATH_AUDIO_FILES --pathCheckpoint $PATH_CHECKPOINT_DIR --LabelsPath $Path_Pseudo_Labels --file_extension $EXTENSION --normMode batchNormn--rnnMode linear --nLevelsGRU 2 --max_size_loaded 1000000000 --save_step 1 --alpha_val $Cluster_Loss_Weighting

Where:

$PATH_AUDIO_FILES is the directory containing the audio files. The files should be arranged as below:

PATH_AUDIO_FILES
│
└───speaker1
│   └───...
│         │   seq_11.{$EXTENSION}
│         │   seq_12.{$EXTENSION}
│         │   ...
│
└───speaker2
    └───...
          │   seq_21.{$EXTENSION}
          │   seq_22.{$EXTENSION}

$PATH_CHECKPOINT_DIR in the directory where the checkpoints will be saved
$EXTENSION is the extension of each audio file
$Path_Pseudo_Labels is the directory that contains the psuedo labels of all the audio files in $PATH_AUDIO_FILES
$Cluster_Loss_Weighting provides the weighting factor for the cluster loss.

Restarting the session

To restart a session from the last save checkpoint run

python cpc/train_mod.py --pathCheckpoint $PATH_CHECKPOINT_DIR

Generating the pseudo labels for training

Create quantized.txt using the repository here

python create_pseudolabels.py --input_file $Path_Containing_quantized.txt --out_path $Output_Dir

$Output_Dir is the directory where .pt files containing pseudo labels

Extracting features, training K Means and Language Models

Extract the features for K means clustering and train K Means clustering, Language models using the repository here

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Related tags

Overview

CPC_DeepCluster

setup instructions

Using the Repository

Restarting the session

Generating the pseudo labels for training

Extracting features, training K Means and Language Models

Owner

LEAP Lab

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Prefix-Tuning: Optimizing Continuous Prompts for Generation

GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

GrailQA: Strongly Generalizable Question Answering

[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

Model parallel transformers in Jax and Haiku

Dynamic Bottleneck for Robust Self-Supervised Exploration

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

OpenMMLab Computer Vision Foundation

The project covers common metrics for super-resolution performance evaluation.

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Pytorch and Torch testing code of CartoonGAN

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

天勤量化开发包, 期货量化, 实时行情/历史数据/实盘交易