Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Overview

Differentially private Imagenet training

Code for the tech report Toward Training at ImageNet Scale with Differential Privacy by Alexey Kurakin, Steve Chien, Shuang Song, Roxana Geambasu, Andreas Terzis and Abhradeep Thakurta.

This is not an officially supported Google product.

Repository structure

  • benchmarks directory contains code which we used to compare performance of various DP-SGD frameworks on CIFAR10 and MNIST
  • imagenet directory contains Imagenet trainign code.

Installation

  1. If you are going to use NVIDIA GPU then install latest NVIDIA drivers, CUDA and CuDNN. While latest versions are not strictly necessary to run the code, we sometimes observed slower performance with older versions of CUDA and CuDNN.

  2. Set up Python virtual environment with all necessary libraries:

    # Create virtualenv
    virtualenv -p python3 ~/.venv/dp_imagenet
    source ~/.venv/dp_imagenet/bin/activate
    # Install Objax with CUDA
    pip install --upgrade objax
    pip install --upgrade jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_releases.html
    # Tensorflow and TFDS (for datasets readers)
    pip install tensorflow
    pip install tensorflow-datasets
  3. Extra libraries for TF and Opacus benchmarks:

    pip install tensorflow-privacy
    pip install opacus
    pip install torchvision
    pip install tensorboard
  4. Follow instructions at https://www.tensorflow.org/datasets/catalog/imagenet2012 to download Imagenet dataset for TFDS.

Before running any code, make sure to enter virtual environment and setup PYTHONPATH:

# Enter virtual env, set up path
source ~/.venv/dp_imagenet/bin/activate
cd ${REPOSITORY_DIRECTORY}
export PYTHONPATH=$PYTHONPATH:.

Training Imagenet models with DP

Here are few examples showing how to run Imagenet training with and without DP:

# Resnet50 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --train_device_batch_size=64 --disable_dp

# Resnet18 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --model=resnet18 --train_device_batch_size=64 --disable_dp

# Resnet18 with DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --model=resnet18 --train_device_batch_size=64

To pre-train model on Places365 and finetune with differential privacy on Imagenet use the following commands:

# Prepare directory for Places365 checkpoint
PLACES_CHECKPOINT_DIR="${HOME}/experiments/places365"
mkdir -p "${PLACES_CHECKPOINT_DIR}"

# Pre-train model on Places365 without differential privacy
# This will train a model to about 55% accuracy on Places365
# when run on 8 GPUs.
python imagenet/imagenet_train.py \
  --tfds_data_dir="${TFDS_DATA_DIR}" \
  --dataset=places365 \
  --eval_every_n_steps=1024 \
  --model=resnet18 \
  --num_train_epochs=80 \
  --lr_warmup_epochs=4 \
  --base_learning_rate=0.05 \
  --disable_dp \
  --train_device_batch_size=128 \
  --model_dir="${PLACES_CHECKPOINT_DIR}"

# Prepare directory for Imagenet checkpoint
IMAGENET_DP_CHECKPOINT_DIR="${HOME}/experiments/imagenet_dp"
mkdir -p "${IMAGENET_DP_CHECKPOINT_DIR}"

# Finetune model on Imagenet with differential privacy.
# This will train a differentially private Imagenet model
# to approximately 48% accuracy with epsilon ~10, delta ~10^{-6}
# when run on 8 GPUs.
# If number of GPUs is different then adjust --grad_acc_steps argument
# such that number_of_gpus*grad_acc_steps = 512.
python imagenet/imagenet_train.py \
  --tfds_data_dir="${TFDS_DATA_DIR}" \
  --eval_every_n_steps=1024 \
  --model=resnet18 \
  --num_train_epochs=70 \
  --dp_clip_norm=1.0 \
  --dp_sigma=0.058014 \
  --grad_acc_steps=64 \
  --base_learning_rate=0.03 \
  --lr_warmup_epochs=1 \
  --num_layers_to_freeze=6 \
  --finetune_path="${PLACES_CHECKPOINT_DIR}/ckpt/0000141312.npz" \
  --model_dir="${IMAGENET_DP_CHECKPOINT_DIR}"

Running DP-SGD benchmarks

Following commands were used to obtain benchmarks of various frameworks for the tech report. All of them were run on n1-standard-96 Google Cloud machine with 8 v100 GPUs. All numbers were obtains with CUDA 11.4 and CuDNN 8.2.2.26.

Objax benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_objax.py --disable-dp

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_objax.py

# CIFAR10 benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_objax.py --disable-dp

# CIFAR10 benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_objax.py

# Imagenet benchmark Resnet18 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --disable_dp --base_learning_rate=0.2

# Imagenet benchmark Resnet18 with DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --base_learning_rate=2.0

Opacus benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_opacus.py --disable-dp

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_opacus.py

# CIFAR10 benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_opacus.py --disable-dp

# CIFAR10 benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_opacus.py

Tensorflow benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_tf.py --dpsgd=False

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_tf.py

# CIFAR10 example without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_tf.py --dpsgd=False

# CIFAR10 example with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_tf.py
Owner
Google Research
Google Research
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

ImageBART NeurIPS 2021 Patrick Esser*, Robin Rombach*, Andreas Blattmann*, Björn Ommer * equal contribution arXiv | BibTeX | Poster Requirements A sui

CompVis Heidelberg 110 Jan 01, 2023
A PyTorch implementation of the architecture of Mask RCNN

EDIT (AS OF 4th NOVEMBER 2019): This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a reso

Sai Himal Allu 975 Dec 30, 2022
Temporal Segment Networks (TSN) in PyTorch

TSN-Pytorch We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as oth

1k Jan 03, 2023
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022
Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

lolviz By Terence Parr. See Explained.ai for more stuff. A very nice looking javascript lolviz port with improvements by Adnan M.Sagar. A simple Pytho

Terence Parr 785 Dec 30, 2022
Python Environment for Bayesian Learning

Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl in

Abhik Shah 103 Jul 14, 2022
The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

BMC The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing". BibTex entry available here. B

Orange 383 Dec 16, 2022
Image Super-Resolution by Neural Texture Transfer

SRNTT: Image Super-Resolution by Neural Texture Transfer Tensorflow implementation of the paper Image Super-Resolution by Neural Texture Transfer acce

Zhifei Zhang 413 Nov 30, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

Billy 20 Sep 12, 2021
frida工具的缝合怪

fridaUiTools fridaUiTools是一个界面化整理脚本的工具。新人的练手作品。参考项目ZenTracer,觉得既然可以界面化,那么应该可以把功能做的更加完善一些。跨平台支持:win、mac、linux 功能缝合怪。把一些常用的frida的hook脚本简单统一输出方式后,整合进来。并且

diveking 997 Jan 09, 2023
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply c

Max Halford 24 Dec 20, 2022
Crawl & visualize ICLR papers and reviews

Crawl and Visualize ICLR 2022 OpenReview Data Descriptions This Jupyter Notebook contains the data crawled from ICLR 2022 OpenReview webpages and thei

Federico Berto 75 Dec 05, 2022
Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

Holy Wu 11 Jun 19, 2022
Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information" Notes I probabl

Berkeley Expert System Technologies Lab 0 Jul 01, 2021
TensorFlow implementation of ENet, trained on the Cityscapes dataset.

segmentation TensorFlow implementation of ENet (https://arxiv.org/pdf/1606.02147.pdf) based on the official Torch implementation (https://github.com/e

Fredrik Gustafsson 248 Dec 16, 2022