Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Last update: Jul 08, 2022

Related tags

Deep Learning AuxiliaryRawNet

Overview

This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and handcrafted features as inputs,to balance the trade-off between performance and model complexity. The paper can be checked here.

The model performance is tested on the ASVSpoof 2019 Dataset.

Setup

Environment

Show details

speechbrain==0.5.7
pandas
torch==1.9.1
torchaudio==0.9.1
nnAudio==0.2.6
ptflops==0.6.6

Create a conda environment with conda env create -f environment.yml.
Activate the conda environment with conda activate .

Data preprocessing

.
├── data                       
│   │
│   ├── PA                  
│   │   └── ...
│   └── LA           
│       ├── ASVspoof2019_LA_asv_protocols
│       ├── ASVspoof2019_LA_asv_scores
│       ├── ASVspoof2019_LA_cm_protocols
│       ├── ASVspoof2019_LA_train
│       ├── ASVspoof2019_LA_dev
│       
│
└── ARawNet

Download dataset. Our experiment is trained on the Logical access (LA) scenario of the ASVspoof 2019 dataset. Dataset can be downloaded here.
Unzip and save the data to a folder data in the same directory as ARawNet as shown in below.
Run python preprocess.py Or you can use our processed data directly under "/processed_data".

Train

python train_raw_net.py yaml/RawSNet.yaml --data_parallel_backend -data_parallel_count=2

Evaluate

python eval.py

Check Model Size and multiply-and-accumulates (MACs)

python check_model_size.py yaml/RawSNet.yaml

Model Performance

Accuracy metric

min t−DCF =min{βPcm (s)+Pcm(s)}

Explanations can be found here: t-DCF

Experiment Results

	Front-end	Main Encoder	E_A	EER	min-tDCF
Res2Net	Spec	Res2Net	-	8.783	0.2237
	LFCC		-	2.869	0.0786
	CQT		-	2.502	0.0743
Rawnet2	Raw waveforms	Rawnet2	-	5.13	0.1175
ARawNet	Mel-Spectrogram	XVector	✅	1.32	0.03894
			-	2.39320	0.06875
ARawNet	Mel-Spectrogram	ECAPA-TDNN	✅	1.39	0.04316
			-	2.11	0.06425
ARawNet	CQT	XVector	✅	1.74	0.05194
			-	3.39875	0.09510
ARawNet	CQT	ECAPA-TDNN	✅	1.11	0.03645
			-	1.72667	0.05077

Main Encoder	Auxiliary Encoder	Parameters	MACs
Rawnet2	-	25.43 M	7.61 GMac
Res2Net	-	0.92 M	1.11 GMac
XVector	✅	5.81 M	2.71 GMac
XVector	-	4.66M	1.88 GMac
ECAPA-TDNN	✅	7.18 M	3.19 GMac
ECAPA-TDNN	-	6.03M	2.36 GMac

Cite Our Paper

If you use this repository, please consider citing:

@inproceedings{Teng2021ComplementingHF, title={Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model}, author={Zhongwei Teng and Quchen Fu and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

@inproceedings{Fu2021FastAudioAL, title={FastAudio: A Learnable Audio Front-End for Spoof Speech Detection}, author={Quchen Fu and Zhongwei Teng and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Related tags

Overview

Overview

Setup

Environment

Data preprocessing

Train

Evaluate

Check Model Size and multiply-and-accumulates (MACs)

Model Performance

Accuracy metric

Experiment Results

Cite Our Paper

Owner

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Computational inteligence project on faces in the wild dataset

Docker containers of baseline agents for the Crafter environment

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Code and datasets for TPAMI 2021

PyTorch version implementation of DORN

Identify the emotion of multiple speakers in an Audio Segment

offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

A system used to detect whether a person is wearing a medical mask or not.

NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

StyleMapGAN - Official PyTorch Implementation

UFPR-ADMR-v2 Dataset

Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images

TANL: Structured Prediction as Translation between Augmented Natural Languages