FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

Last update: Dec 23, 2022

Overview

FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API. Various algorithms for federated learning and local SGD are implemented for benchmarking and research, including our own proposed methods:

And other common algorithms such as:

We are actively trying to expand the library to include more training algorithms as well.

NEWS

Recent updates to the package:

Paper Accepted (01/22/2021): Our paper titled Federated Learning with Compression: Unified Analysis and Sharp Guarantees accepted to AISTAT 2021 🎉
Public Release (01/17/2021): We are releasing the package to the public with a docker image

Installation

First you need to clone the repo into your computer:

git clone https://github.com/MLOPTPSU/FedTorch.git

The PyPi package will be added soon.

This package is built based on PyTorch Distributed API. Hence, it could be run with any supported distributed backend of GLOO, MPI, and NCCL. Among these three, MPI backend since it can be used for both CPU and CUDA runnings, is the main backend we use for developement. Unfortunately installing the built version of PyTorch does not support MPI backend for distributed training and needed to be built from source with a version of MPI installed that supports CUDA as well. However, do not worry since we got you covered. We provide a docker file that can create an image with all dependencies needed for FedTorch. The Dockerfile can be found here, where you can edit based on your need to create your customized image. In addition, since building this docker image might take a lot of time, we provide different versions that we built before along with this repository in the packages section.

For instance, you can pull one of the images that is built with CUDA 10.2 and OpenMPI 4.0.1 with CUDA support and PyTorch 1.6.0, using the following command:

docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi

The docker images can be used for cloud services such as Azure Machine Learning API for running FedTorch. The instructions for running on cloud services will be added in near future.

Get Started

Running different trainings with different settings is easy in FedTorch. The only thing you need to take care of is to set the correct parameters for the experiment. The list of all parameters used in this package is in parameters.py file. For different algorithms we will provide examples, so the relevant parameters can be set correctly. YAML support will be added in future for each distinct training. The parameters can be parsed from the input of the commandline using the following method:

from fedtorch.parameters import get_args

args = get_args()

When the parameters are set, we need to setup the nodes using those parameters and start the training. First, we need to setup the distributed backend. For instance, we can use MPI backend as:

import torch.distributed as dist

dist.init_process_group('mpi')

This will initialize the backend for distributed training. Next, we need to setup each node based on the parameters and create the graph of nodes. Then we need to initialize nodes and load their data. For this, we can use the node object in the FedTorch:

from fedtorch.nodes import Client

client = Client(args, dist.get_rank())
# Initialize the node
client.initialize()
# Initialize the dataset if not downloaded
client.initialize_dataset()
# Load the dataset
client.load_local_dataset()
# Generate auxiliary models and params for training
client.gen_aux_models()

Then, we need to call the appropriate training method and run the training. For instance, if the parameters are set for a FedAvg training, we can run:

from fedtorch.comms.trainings.federated import train_and_validate_federated

train_and_validate_federated(client)

Different distributed and federated algorithms in this package can be run using this procedure, and for simplicity, we provide main.py file, where can be used for running those algorithms following the same procedure. To run this file, we should run it using mpi and define number of clients (processes) that will run the same file for training using:

mpirun -np {NUM_CLIENTS} python main.py {args:values}

where {args:values} should be filled with appropriate parameters needed for the training. Next we provide a file to automatically build this command for mpi running for various situations.

Running Examples

To make the process easier, we have provided a run_mpi.py file that covers most of parameters needed for running different training algrithms. We first get into the details of different parameters and then provide some examples for running.

Dataset Parameters

For setting up the dataset there are some parameters involved. The main parameters are:

--data : Defines the dataset name for training.
--batch_size : Defines the size of the batch in the training.
--partition_data : Defines whether the data should be partitioned or each client access to the whole dataset.
--reshuffle_per_epoch : This can be set True for distributed training to have iid data accross clients and faster convergence. This is not inline with Federated Learning settings.
--iid_data : If set True, the data is randomly distributed accross clients. If is set to False, either the dataset itself is non-iid by default (like the EMNIST dataset) or it can be manullay distributed to be non-iid (like the MNIST dataset using parameter num_class_per_client). The default is True in the package, but in the run_mpi.py file the default is False.
--num_class_per_client: If the parameter iid is set to False, we can distribute the data heterogeneously by attributing certain number of classes to each client. For instance if setting --num_class_per_client 2, then each client will only has access to two randomly selected classes' data in the entire training process.

Federated Learning Parameters

To run the training using federated learning setups some main parameters are:

--federated : If set to True the training will be in one of federated learning setups. If not, it will be in a distributed mode using local SGD and with periodic averaging (that could be set using --local_step) and possibly reshuffling after each epoch. The default is False.
--federated_type : defines the type of fderated learning algorithm we want to use. The default is fedavg.
--federated_sync_type : It could be either epoch or local_step and it will be used to determine when to synchronize the models. If set to epoch, then the parameter --num_epochs_per_comm should be set as well. If set to local_step, then the parameter --local_steps should be set. The default is epoch.
--num_comms : Defines the number of communication rounds needed for trainings. This is only for federated learning, while in normal distirbuted mode the number of total iterations should be set either by --num_epochs or --num_iterations, and hence the --stop_criteria should be either epoch or iteration.
--online_client_rate : Defines the ratio of clients that are online and active during each round of communication. This is only for federated learning. The default value is 1.0, which means all clients will be active.

Learning Rate Schedule

Different learning rate schedules can be set using their corresponding parameters. The main parameter is --lr_schedule_scheme, which defines the scheme for learning rate scheduling. For more information about different learning rate schedulers, please see learning.py file.

Examples

Now we provide some simple examples for running some of the training algorithms on a single node with multiple processes using mpi. To do so, we first need to run the docker container with installed dependencies.

docker run --rm -it --mount type=bind,source="{path/to/FedTorch}",target=/FedTorch docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi
cd /FedTorch

This will run the container and will mount the FedTorch repo to it. The {path/to/FedTorch} should be replaced with your local path to the FedTorch repo directory. Now we can run the training on it.

FedAvg and FedGATE

Now, we can run the FedAvg algorithm for training an MLP model using MNIST data by the following command.

python run_mpi.py -f -ft fedavg -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2

This will run the training on 10 nodes with initial learning rate of 0.1, the batch size of 50, for 20 communication rounds each with 10 local steps of SGD. The dataset is distributed hetergeneously with each client has access to only 2 classes of data from the MNIST dataset.

Changing -ft fedavg to -ft fedgate will run the same training using the FedGATE algorithm. To run the FedCOMGATE algorithm we need to add -q to the parameter to enable quantization as well. Hence the command will be:

python run_mpi.py -f -ft fedgate -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -q

APFL

To run APFL algorithm a simple command will be:

python run_mpi.py -f -ft apfl -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -pa 0.5 -fp

where -pa 0.5 sets the alpha parameter of the APFL algorithm. The last parameter -fp will turn on the fed_personal parameter, which evaluate the personalized or the localized model using a local validation dataset. This will be mostly used for personalization algorithms such as APFL.

DRFA

To run a DRFA training we can use the following command:

python run_mpi.py -f -fd -ft fedavg -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -dg 0.1

where -dg 0.1 sets the gamma parameter in the DRFA algorithm. Note that DRFA is a framework that can be run using any federated learning aggergator such as FedAvg or FedGATE. Hence the parameter -fd will enable DRFA training and -ft will define the federated type to be used for aggregation.

References

For this repository there are several different references used for each training procedure. If you use this repository in your research, please cite the following paper:

@article{haddadpour2020federated,
  title={Federated learning with compression: Unified analysis and sharp guarantees},
  author={Haddadpour, Farzin and Kamani, Mohammad Mahdi and Mokhtari, Aryan and Mahdavi, Mehrdad},
  journal={arXiv preprint arXiv:2007.01154},
  year={2020}
}

Our other papers developed using this repository should be cited using the following bibitems:

@inproceedings{haddadpour2019local,
  title={Local sgd with periodic averaging: Tighter analysis and adaptive synchronization},
  author={Haddadpour, Farzin and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad and Cadambe, Viveck},
  booktitle={Advances in Neural Information Processing Systems},
  pages={11082--11094},
  year={2019}
}
@article{deng2020distributionally,
  title={Distributionally Robust Federated Averaging},
  author={Deng, Yuyang and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
@article{deng2020adaptive,
  title={Adaptive Personalized Federated Learning},
  author={Deng, Yuyang and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad},
  journal={arXiv preprint arXiv:2003.13461},
  year={2020}
}

Acknowledgement and Disclaimer

This repository is developed, mainly by MM. Kamani, based on our group's research on distributed and federated learning algorithms. We also developed the works of other groups' proposed methods using FedTorch for a better comparison. However, this repo is not the official code for those methods other than our group's. Some parts of the initial stages of this repository were based on a forked repo of Local SGD code from Tao Lin, which is not public now.

Comments

Errors when running the code using Docker

Hi, I followed your README instructions to run your algorithm but I encountered multiple errors along the way:

I tried to pull the image you provided on This issue #3 and run the following command: python run_mpi.py -f -ft apfl -n 10 -d cifar10 -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -pa 0.5 -fp -oc I keep getting warnings about my RTX 2080ti card which I think is not been recognized.

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(args)
  File "main.py", line 21, in main
    client.initialize()
  File "/workspace/fedtorch/fedtorch/nodes/nodes.py", line 44, in initialize
    init_config(self.args)
  File "/workspace/fedtorch/fedtorch/utils/init_config.py", line 36, in init_config
    torch.cuda.set_device(args.graph.device)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 281, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (101) : invalid device ordinal at /workspace/pytorch/torch/csrc/cuda/Module.cpp:59
THCudaCheck FAIL file=/workspace/pytorch/torch/csrc/cuda/Module.cpp line=59 error=101 : invalid device ordinal
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:125: UserWarning:
GeForce RTX 2080 Ti with CUDA capability sm_75 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_35 sm_37 sm_52 sm_60 sm_61 sm_70 compute_70.
If you want to use the GeForce RTX 2080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(args)
  File "main.py", line 21, in main
    client.initialize()
  File "/workspace/fedtorch/fedtorch/nodes/nodes.py", line 44, in initialize
    init_config(self.args)
  File "/workspace/fedtorch/fedtorch/utils/init_config.py", line 36, in init_config
    torch.cuda.set_device(args.graph.device)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 281, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (101) : invalid device ordinal at /workspace/pytorch/torch/csrc/cuda/Module.cpp:59

Regarding these warnings the code does not collapse but not even one log regarding the training procedure is printed.

In addition when I run nvidia-smi I see the following gpu utilization:

The utilization percentage across all 3 GPUs remains zero.

As second option I tried to build the image given the Dockerfile located in the docker folder. Yet again I encountered the following run error:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    import torch.distributed as dist
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 189, in <module>
    from torch._C import *
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51835,1],1]
  Exit code:    1
--------------------------------------------------------------------------

It looks like there is a problem with PyTorch installation so I tried something simple:

Upgrading Numpy's version didn't help me either. Can you please help and explain how to run your code?

Thank you!

opened by AvivSham 5

A question about APFL algorithm

Hi, I read your paper and am interested in the algorithm of APFL. I have a question about the code. It seems that the code of APFL is a little different from the algorithm written on the paper. In the paper, each client maintains 3 models and first update the local version of global model, then update the local model and finally mix them to get the new personalized model. However, in this code, it just maintains 2 models and first update the local version of global model, then get the grad of personalized model by mixing the output of local version of global model and personalized model.

The code: `

inference and get current performance.

                    client.optimizer.zero_grad()
                    loss, performance = inference(client.model, client.criterion, client.metrics, _input, _target)

                    # compute gradient and do local SGD step.
                    loss.backward()
                    client.optimizer.step(
                        apply_lr=True,
                        apply_in_momentum=client.args.in_momentum, apply_out_momentum=False
                    )
                    
                    client.optimizer.zero_grad()
                    client.optimizer_personal.zero_grad()
                    loss_personal, performance_personal = inference_personal(client.model_personal, client.model, 
                                                                             client.args.fed_personal_alpha, client.criterion, 
                                                                             client.metrics, _input, _target)

                    # compute gradient and do local SGD step.
                    loss_personal.backward()
                    client.optimizer_personal.step(
                        apply_lr=True,
                        apply_in_momentum=client.args.in_momentum, apply_out_momentum=False
                    )

` Are they the same? Looking forward to your reply. Thank you!

opened by BrightHaozi 3

a question about fedprox

in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.

elif client.args.federated_type == 'fedprox':
    # Adding proximal gradients and loss for fedprox
    for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
        if client.args.graph.rank == 0:
            print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
        loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
        client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)

client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) may mean $\frac{\mu}{2} \left\|w-w_t\right\|$ . But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$ is used in fedprox.

I'm not sure what I said above is true. Thank you very much for your kind consideration.

opened by bird-two 2

Could you release the code in "Federated Learning with Compression: Unified Analysis and Sharp Guarantees"?

Hi,

I just read the paper of "Federated Learning with Compression: Unified Analysis and Sharp Guarantees", it looks very interesting While searching for the code I was directed to this repo, however, I didn't found the implementation of FedCOM, so I would like to ask if you would release it?

Thanks!

opened by Dzhange 1
Does BrokenPipeError Matters？

Hi, I am very interested in your proposed method. It is really nice of you to share such a helpful and clear repo. I follow your repo and use your docker images to run the code, but I keep getting some errors as follows after some random starting round of the traning.

Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/queues.py", line 245, in _feed send_bytes(obj) File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

After all, I still can get a result, but I wonder whether the brokenpipeerror matters, am I still getting the correct results? Hope to hear your response soon.

opened by RongDai430 1
No basic auth credentials

I read your paper and am interested in your proposed method. It is really nice of you to share such a helpful and clear repo. I pulled one of the images built with CUDA 10.2 and OpenMPI 4.0.1 with CUDA support and PyTorch 1.6.0, using the following command: $ docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi

However it reported an error "Error response from daemon: Head https://docker.pkg.github.com/v2/mloptpsu/fedtorch/fedtorch/manifests/cuda10.2-mpi: no basic auth credentials".
How can I solve this error?

opened by Sybil-Huang 1
Adding EMNIST full dataset

Adding EMNIST full dataset in addition to its digits_only version, which was implemented before. Now, we have emnist and emnist_full datasets. The emnist dataset has 3383 clients with 10 classes, but the emnist_full has 3400 clients with 62 classes. Resolve the Batch Norm issue for training with 1 sample in the batch.

opened by mmkamani7 1
setup problem in docker

Hello,when I ran docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi in my docker,there's a problem do you have any ideas to solve it? Thanks.

opened by b856741 0
A question about drawing the plots

Hi, Recently I've read your paper "Distributionally Robust Federated Averaging" carefully and I'm trying to reproduce the experiments in it. However, I don't know how to get the plots from the codes, I think modifying main.py and run_mpi.py would help but it seems really complicated. Could you please give me some instructions? Thanks!

opened by Zhg9300 2
Models not saved for Federated Centered

Hi While running main_centered.py the model checkpoints not saved. It is not Implemented?

Is there anything else missing for main_centered.py?

Regards, Ahmed
enhancement

opened by aldahdooh 1

Releases(v0.1.1)

v0.1.1(Feb 1, 2021)
In this release, the dependency on the Tensorflow Federated for accessing its datasets is dropped 🎉. Now we can get the dataset from their provided URL and process it there.

Also, the EMNIST dataset using its full 62 classes is updated in all modules and makes it available for training.

Source code(tar.gz)
Source code(zip)
v0.1.0(Jan 18, 2021)
This is the first public release of FedTorch. In this repository, we develop different distributed and federated learning schema for the training of machine learning models. It uses PyTorch Distributed API for running on multiple nodes and/or processes in parallel. Some of the main algorithms in this package:

Distributed Sync-SGD

Distributed Local SGD

Local SGD with Adaptive Synchoronization (LUPA-SGD)

Adaptive Personalized Federated Learning (APFL)

Distributionally Robust Federated Learning (DRFA)

Federated Learning with Gradient Tracking and Compression (FedGATE and FedCOMGATE)

FedAvg

SCAFFOLD

Qsparse Local SGD

AFL

FedProx

Different models for training:

Convex:

Least Squared Linear Regression

Logistic Regression

Robust Regression

Robust Logistic Regression

Non-Convex:

MLP

CNN

RNN

ResNet

DenseNet

WideResNet

Robust MLP

Datasets included in this package, with iid and non-iid distribution mechanisms:

MNIST

CIFAR10 and CIFAR100

Fashion MNIST

EMNIST and FEMNIST

STL10

Epsilon

Year Prediction MSD

RCV1

Higgs

Synthetic Dataset

Adult

Shakespear Dataset

Source code(tar.gz)
Source code(zip)

Owner

Machine Learning and Optimization Lab @PennState

This is the GitHub repository of the Machine Learning and Optimization lab at Penn State University.

GitHub Repository

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

116 Dec 26, 2022

End-To-End Memory Network using Tensorflow

MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo

339 Oct 27, 2022

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

178 Jan 05, 2023

This is a simple framework to make object detection dataset very quickly

FastAnnotation Table of contents General info Requirements Setup General info This is a simple framework to make object detection dataset very quickly

1 Jan 24, 2022

A collection of scripts I developed for personal and working projects.

A collection of scripts I developed for personal and working projects Table of contents Introduction Repository diagram structure List of scripts pyth

109 Dec 26, 2022

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

112 Nov 07, 2022

FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

Related tags

Overview

NEWS

Installation

Get Started

Running Examples

Dataset Parameters

Federated Learning Parameters

Learning Rate Schedule

Examples

FedAvg and FedGATE

APFL

DRFA

References

Acknowledgement and Disclaimer

Comments

inference and get current performance.

Releases(v0.1.1)

v0.1.1(Feb 1, 2021)

v0.1.0(Jan 18, 2021)

Owner

Machine Learning and Optimization Lab @PennState

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

End-To-End Memory Network using Tensorflow

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

This is a simple framework to make object detection dataset very quickly

A collection of scripts I developed for personal and working projects.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Build and run Docker containers leveraging NVIDIA GPUs

Google Recaptcha solver.

"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

Neural Surface Maps

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

DISTIL: Deep dIverSified inTeractIve Learning.

Pytorch library for fast transformer implementations

Keras implementation of Deeplab v3+ with pretrained weights

A curated list of references for MLOps

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".