Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

An efficient PyTorch implementation of the evaluation metrics in recommender systems.

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

PyTorch implementation for the paper Pseudo Numerical Methods for Diffusion Models on Manifolds

Tool cek opsi checkpoint facebook!

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Asterisk is a framework to generate high-quality training datasets at scale

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Parris, the automated infrastructure setup tool for machine learning algorithms.

The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

MvtecAD unsupervised Anomaly Detection

Realtime_Multi-Person_Pose_Estimation

Progressive Coordinate Transforms for Monocular 3D Object Detection

Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform some analysis,,