Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Overview

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

In this work, we propose an algorithm DP-SCAFFOLD(-warm), which is a new version of the so-called SCAFFOLD algorithm ( warm version : wise initialisation of parameters), to tackle heterogeneity issues under mathematical privacy constraints known as Differential Privacy (DP) in a federated learning framework. Using fine results of DP theory, we have succeeded in establishing both privacy and utility guarantees, which show the superiority of DP-SCAFFOLD over the naive algorithm DP-FedAvg. We here provide numerical experiments that confirm our analysis and prove the significance of gains of DP-SCAFFOLD especially when the number of local updates or the level of heterogeneity between users grows.

Two datasets are studied:

  • a real-world dataset called Femnist (an extended version of EMNIST dataset for federated learning), which you see the Accuracy growing with the number of communication rounds (50 local updates first and then 100 local updates)

image_femnist image_femnist

  • synthetic data called Logistic for logistic regression models, which you see the train loss decreasing with the number of communication rounds (50 local updates first and then 100 local updates),

image_logistic image_logistic

Significant results are available for both of these datasets for logistic regression models.

Structure of the code

  • main.py: four global options are available.
    • generate: to generate data, introduce heterogeneity, split data between users for federated learning and preprocess data
    • optimum (after generate): to run a phase training with unsplitted data and save the "best" empirical model in a centralized setting to properly compare rates of convergence
    • simulation (after generate and optimum): to run several simulations of federated learning and save the results (accuracy, loss...)
    • plot (after simulation): to plot visuals

./data

Contains generators of synthetic (Logistic) and real-world (Femnist) data ( file data_generator.py), designed for a federated learning framework under some similarity parameter. Each folder contains a file data where the generated data (train and test) is stored.

./flearn

  • differential_privacy : contains code to apply Gaussian mechanism (designed to add differential privacy to mini-batch stochastic gradients)
  • optimizers : contains the optimization framework for each algorithm (adaptation of stochastic gradient descent)
  • servers : contains the super class Server (in server_base.py) which is adapted to FedAvg and SCAFFOLD (algorithm from the point of view of the server)
  • trainmodel : contains the learning model structures
  • users : contains the super class User (in user_base.py) which is adapted to FedAvg and SCAFFOLD ( algorithm from the point of view of any user)

./models

Stores the latest models over the training phase of federated learning.

./results

Stores several metrics of convergence for each simulation, each similarity/privacy setting and each algorithm.

Metrics (evaluated at each round of communication):

  • test accuracy over all users,
  • train loss over all users,
  • highest norm of parameter difference (server/user) over all selected users,
  • train gradient dissimilarity over all users.

Software requirements:

  • To download the dependencies: pip install -r requirements.txt

References

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Transfer learning approach to bicycle sharing systems station location planning using OpenStreetMap Companion repository to the paper accepted at the

Politechnika Wrocławska - repozytorium dla informatyków 4 Oct 24, 2022
The easiest tool for extracting radiomics features and training ML models on them.

Simple pipeline for experimenting with radiomics features Installation git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git cd classrad pi

Piotr Woźnicki 17 Aug 04, 2022
Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI)

Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI) Preparation Clone the Synchronized-BatchNorm-P

Fangneng Zhan 12 Aug 10, 2022
Self-attentive task GAN for space domain awareness data augmentation.

SATGAN TODO: update the article URL once published. Article about this implemention The self-attentive task generative adversarial network (SATGAN) le

Nathan 2 Mar 24, 2022
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022
Welcome to The Eigensolver Quantum School, a quantum computing crash course designed by students for students.

TEQS Welcome to The Eigensolver Quantum School, a crash course designed by students for students. The aim of this program is to take someone who has n

The Eigensolvers 53 May 18, 2022
Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

DeepGeneAnnotator: A tool to annotate the gene in the genome The master thesis of the "Using deep learning to predict gene structures of the coding ge

Ching-Tien Wang 3 Sep 09, 2022
Go from graph data to a secure and interactive visual graph app in 15 minutes. Batteries-included self-hosting of graph data apps with Streamlit, Graphistry, RAPIDS, and more!

✔️ Linux ✔️ OS X ❌ Windows (#39) Welcome to graph-app-kit Turn your graph data into a secure and interactive visual graph app in 15 minutes! Why This

Graphistry 107 Jan 02, 2023
An Object Oriented Programming (OOP) interface for Ontology Web language (OWL) ontologies.

Enabling a developer to use Ontology Web Language (OWL) along with its reasoning capabilities in an Object Oriented Programming (OOP) paradigm, by pro

TheEngineRoom-UniGe 7 Sep 23, 2022
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Bee Lim 625 Dec 30, 2022
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 01, 2023
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

Lacmus Foundation 168 Dec 27, 2022
A Unified Framework and Analysis for Structured Knowledge Grounding

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu

HKU NLP Group 370 Dec 21, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

Summary This is the source code for the paper "A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining", which was accepted as fu

1 Nov 10, 2021
Python PID Tuner - Based on a FOPDT model obtained using a Open Loop Process Reaction Curve

PythonPID_Tuner Step 1: Takes a Process Reaction Curve in csv format - assumes data at 100ms interval (column names CV and PV) Step 2: Makes a rough e

6 Jan 14, 2022
[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation This repository contains the source code for

Yun-Chun Chen 60 Nov 25, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022