Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Overview

Introduction

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models".

In this work, we demonstrate that existing self-supervised speech model such as HuBERT, wav2vec 2.0, CPC and TERA are vulnerable to membership inference attack (MIA) and thus could reveal sensitive informations related to the training data.

Requirements

  1. Python >= 3.6
  2. Install sox on your OS
  3. Install s3prl on your OS
git clone https://github.com/s3prl/s3prl
cd s3prl
pip install -e ./
  1. Install the specific fairseq
pip install [email protected]+https://github.com//pytorch/[email protected]#egg=fairseq

Preprocessing

First, extract the self-supervised feature of utterances in each corpus according to your needs.

Currently, only LibriSpeech is available.

BASE_PATH=/path/of/the/corpus
OUTPUT_PATH=/path/to/save/feature
MODEL=wav2vec2
SPLIT=train-clean-100 # you should extract train-clean-100, dev-clean, dev-other, test-clean, test-other

python preprocess_feature_LibriSpeech.py \
    --base_path $BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --split $SPLIT

Speaker-level MIA

After extracting the features, you can apply the attack against the models using either basic attack and improved attack.

Noted that you should run the basic attack to generate the .csv file with similarity scores before performing improved attack.

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-speaker-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-speaker-level-attack-similarity.csv"

python customized-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-speaker-similarity-model-${MODEL}.pt"

Utterance-level MIA

The process for utterance-level MIA is similar to that of speaker-level:

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-utterance-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-utterance-level-attack-similarity.csv"

python customized-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-utterance-similarity-model-${MODEL}.pt"

Citation

If you find our work useful, please cite:

Owner
Wei-Cheng Tseng
Wei-Cheng Tseng
Omniverse sample scripts - A guide for developing with Python scripts on NVIDIA Ominverse

Omniverse sample scripts ここでは、NVIDIA Omniverse ( https://www.nvidia.com/ja-jp/om

ft-lab (Yutaka Yoshisaka) 37 Nov 17, 2022
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition

Rui Qian 17 Dec 12, 2022
ConvMixer unofficial implementation

ConvMixer ConvMixer 非官方实现 pytorch 版本已经实现。 nets 是重构版本 ,test 是官方代码 感兴趣小伙伴可以对照看一下。 keras 已经实现 tf2.x 中 是tensorflow 2 版本 gelu 激活函数要求 tf=2.4 否则使用入下代码代替gelu

Jian Tengfei 8 Jul 11, 2022
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

Ruiqi Gao 39 Nov 10, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

74 Dec 15, 2022
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 05, 2023
yolov5目标检测模型的知识蒸馏(基于响应的蒸馏)

代码地址: https://github.com/Sharpiless/yolov5-knowledge-distillation 教师模型: python train.py --weights weights/yolov5m.pt \ --cfg models/yolov5m.ya

52 Dec 04, 2022
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi

Quoc-Tuan Truong 83 Dec 05, 2022
The ARCA23K baseline system

ARCA23K Baseline System This is the source code for the baseline system associated with the ARCA23K dataset. Details about ARCA23K and the baseline sy

4 Jul 02, 2022
Fortuitous Forgetting in Connectionist Networks

Fortuitous Forgetting in Connectionist Networks Introduction This repository includes reference code for the paper Fortuitous Forgetting in Connection

Hattie Zhou 14 Nov 26, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
Deeplab-resnet-101 in Pytorch with Jaccard loss

Deeplab-resnet-101 Pytorch with Lovász hinge loss Train deeplab-resnet-101 with binary Jaccard loss surrogate, the Lovász hinge, as described in http:

Maxim Berman 95 Apr 15, 2022
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 09, 2022
Implementation of gaze tracking and demo

Predicting Customer Demand by Using Gaze Detecting and Object Tracking This project is the integration of gaze detecting and object tracking. Predict

2 Oct 20, 2022
Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

kongdebug 14 Oct 14, 2022