Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

Turn based roguelike in python

AI-generated-characters for Learning and Wellbeing

A simple, fully convolutional model for real-time instance segmentation.

URIE: Universal Image Enhancementfor Visual Recognition in the Wild

Code for the paper: Sketch Your Own GAN

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

Very Deep Convolutional Networks for Large-Scale Image Recognition

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Oriented Response Networks, in CVPR 2017

Python 3 module to print out long strings of text with intervals of time inbetween

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

The official repository for BaMBNet

Neural HMMs are all you need (for high-quality attention-free TTS)

MvtecAD unsupervised Anomaly Detection