PyTorch implementation of Densely Connected Time Delay Neural Network

Overview

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

What's New ⚠️

  • [2021-02-14] We add an impl option in TimeDelay, now you can choose:

    • 'conv': implement TDNN by F.conv1d.
    • 'linear': implement TDNN by F.unfold and F.linear.

    Check this commit for more information. Note the pre-trained models of 'conv' have not been uploaded yet.

  • [2021-02-04] TDNN (default implementation) in this repo is slower than nn.Conv1d, but we adopted it because:

    • TDNN in this repo was also used to create F-TDNN models that are not perfectly supported by nn.Conv1d (asymmetric paddings).
    • nn.Conv1d(dilation>1, bias=True) is slow in training.

    However, we do not use F-TDNN here, and we always set bias=False in D-TDNN. So, we are considering uploading a new version of TDNN soon (2021-02-14 updated).

  • [2021-02-01] Our new paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    CAM outperforms statistics-and-selection (SS) in terms of speed and accuracy.

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper ⚠️

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Comments
  • size mismatch while loading pre-trained weights

    size mismatch while loading pre-trained weights

    RuntimeError: Error(s) in loading state_dict for DTDNN: Missing key(s) in state_dict: "xvector.tdnn.linear.bias", "xvector.dense.linear.bias". size mismatch for xvector.tdnn.linear.weight: copying a param with shape torch.Size([128, 30, 5]) from checkpoint, the shape in current model is torch.Size([128, 150]). size mismatch for xvector.block1.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for xvector.block1.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 192, 1]) from checkpoint, the shape in current model is torch.Size([128, 192]). size mismatch for xvector.block1.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block1.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block1.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block1.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block1.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit1.linear.weight: copying a param with shape torch.Size([256, 512, 1]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for xvector.block2.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block2.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block2.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block2.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block2.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 512, 1]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for xvector.block2.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 576, 1]) from checkpoint, the shape in current model is torch.Size([128, 576]). size mismatch for xvector.block2.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd7.linear1.weight: copying a param with shape torch.Size([128, 640, 1]) from checkpoint, the shape in current model is torch.Size([128, 640]). size mismatch for xvector.block2.tdnnd7.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd8.linear1.weight: copying a param with shape torch.Size([128, 704, 1]) from checkpoint, the shape in current model is torch.Size([128, 704]). size mismatch for xvector.block2.tdnnd8.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd9.linear1.weight: copying a param with shape torch.Size([128, 768, 1]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for xvector.block2.tdnnd9.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd10.linear1.weight: copying a param with shape torch.Size([128, 832, 1]) from checkpoint, the shape in current model is torch.Size([128, 832]). size mismatch for xvector.block2.tdnnd10.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd11.linear1.weight: copying a param with shape torch.Size([128, 896, 1]) from checkpoint, the shape in current model is torch.Size([128, 896]). size mismatch for xvector.block2.tdnnd11.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd12.linear1.weight: copying a param with shape torch.Size([128, 960, 1]) from checkpoint, the shape in current model is torch.Size([128, 960]). size mismatch for xvector.block2.tdnnd12.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit2.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for xvector.dense.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]).

    opened by zabir-nabil 3
  • 实验细节的疑问

    实验细节的疑问

    您好: 我想教下您的论文中,实验的实现细节: 1.实验数据:我看很多其他论文都是使用voxceleb2 dev 5994说话人作为训练集(或者voxceleb dev+voxceleb2 dev,1211+5994说话人),您有只在这部分说话人上的实验结果吗?方便透露下嘛?

    2.PLDA和Cosine Similarity:您这里实验比较这两个的EER在TDNN中是提取的是倒数第二层(分类器前一层)还是第三层(xvector)的输出啊?因为我在论文中又看到,这两个不同层embedding对不同方法性能有差异,倒数第二层的cosine方法可能会更好一些。

    Thanks!🙏

    opened by Wenhao-Yang 1
  • questions about model training

    questions about model training

    hello, yuyq96, Thank you so much for the great work you've shared. I learned that D-TDNNSS mini-batch setting 128 from D-TDNN paper. But this model is too large to train on single gpu. Could you tell me how you train it? Using nn.Parallel or DDP? Looking forward to you reply

    opened by forwiat 2
  • the difference between kaldifeat-kaldi and kaldifeat-python?

    the difference between kaldifeat-kaldi and kaldifeat-python?

    May I ask you the numerical difference between kaldifeat by kaldi implementation and kaldifeat by your python implementation? I have compared the two computed features, and I find it has some difference. I wonder that the experiment results showed in D-TDNN master and D-TDNN-kaldifeat branch is absolutely the same.

    Thanks~

    opened by mezhou 4
  • 针对论文的一些疑问

    针对论文的一些疑问

    您好,我觉得您的工作-DTDNN,在参数比较少的情况下获得了较ETDNN,FTDNN更好的结果,我认为这非常有意义。但是我对论文的实验存在两处疑惑: 1、论文中Table5中,基于softmax训练的D-TDNN模型Cosine的结果好于PLDA,在上面的TDNN,ETDNN,FTDNN的结果不一致(均是PLDA好于Cosine),请问这是什么原因导致的? 2、对于null branch,能稍微解释一下吗?

    opened by xuanjihe 10
Releases(trials)
Owner
Ya-Qi Yu
Machine Learning
Ya-Qi Yu
Memory efficient transducer loss computation

Introduction This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to re

Fangjun Kuang 51 Nov 25, 2022
Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Deep Inside Convolutional Networks This is a caffe implementation to visualize the learnt model. Part of a class project at Georgia Tech Problem State

Jigar 61 Apr 15, 2022
A transformer which can randomly augment VOC format dataset (both image and bbox) online.

VocAug It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it i

Coder.AN 1 Mar 05, 2022
571 Dec 25, 2022
A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

20 Dec 16, 2022
Code for Multinomial Diffusion

Code for Multinomial Diffusion Abstract Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural ima

104 Jan 04, 2023
Madanalysis5 - A package for event file analysis and recasting of LHC results

Welcome to MadAnalysis 5 Outline What is MadAnalysis 5? Requirements Downloading

MadAnalysis 15 Jan 01, 2023
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
This script runs neural style transfer against the provided content image.

Neural Style Transfer Content Style Output Description: This script runs neural style transfer against the provided content image. The content image m

Martynas Subonis 0 Nov 25, 2021
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object

151 Dec 26, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

NNProject - DeepMask This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. Th

189 Nov 16, 2022
clustimage is a python package for unsupervised clustering of images.

clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve

Erdogan Taskesen 52 Jan 02, 2023
PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

MIRCO PyTorch implementation for paper: Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation Dependencies Python 3.

Big Data and Multi-modal Computing Group, CRIPAC 9 Dec 08, 2022
R interface to fast.ai

R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod

113 Dec 20, 2022
Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

138 Dec 28, 2022
OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework

OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework Introduction OpenFed is a foundational library for federated learning

25 Dec 12, 2022
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Philipp Erler 329 Jan 06, 2023