FedScale: Benchmarking Model and System Performance of Federated Learning

Last update: Jan 01, 2023

Overview

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper)

This repository contains scripts and instructions of building FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform, FedScale Automated Runtime (FAR), to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and include new execution backends with minimal developer efforts.

FedScale is open-source with permissive licenses and actively maintained, and we welcome feedback and contributions from the community!

Getting Started

Our install.sh will install the following automatically:

Anaconda Package Manager
CUDA 10.2

Note: if you prefer different versions of conda and CUDA, please check comments in install.sh for details.

Run the following commands to install FedScale.

git clone https://github.com/SymbioticLab/FedScale
cd FedScale
source install.sh

Realistic FL Datasets

We are adding more datasets! Please feel free to contribute!

We provide real-world datasets for the federated learning community, and plan to release much more soon! Each is associated with its training, validation and testing dataset. A summary of statistics for training datasets can be found in Table, and you can refer to each folder for more details. Due to the super large scale of datasets, we are uploading these data and carefully validating their implementations to FAR. So we are actively making each dataset available for FAR experiments.

CV tasks:

Dataset	Data Type	# of Clients	# of Samples	Example Task
iNature	Image	2,295	193K	Classification
FMNIST	Image	3,400	640K	Classification
OpenImage	Image	13,771	1.3M	Classification, Object detection
Google Landmark	Image	43,484	3.6M	Classification
Charades	Video	266	10K	Action recognition
VLOG	Video	4,900	9.6k	Video classification, Object detection

NLP tasks:

Dataset	Data Type	# of Clients	# of Samples	Example Task
Europarl	Text	27,835	1.2M	Text translation
Blog Corpus	Text	19,320	137M	Word prediction
Stackoverflow	Text	342,477	135M	Word prediction, classification
Reddit	Text	1,660,820	351M	Word prediction
Amazon Review	Text	1,822,925	166M	Classification, Word prediction
CoQA	Text	7,189	114K	Question Answering
LibriTTS	Text	2,456	37K	Text to speech
Google Speech	Audio	2,618	105K	Speech recognition
Common Voice	Audio	12,976	1.1M	Speech recognition

Misc Applications:

Dataset	Data Type	# of Clients	# of Samples	Example Task
Taobao	Text	182,806	0.9M	Recommendation
Go dataset	Text	150,333	4.9M	Reinforcement learning

Note that no details were kept of any of the participants age, gender, or location, and random ids were assigned to each individual. In using these datasets, we will strictly obey to their licenses, and these datasets provided in this repo should be used for research purpose only.

Please go to ./dataset directory and follow the dataset README for more details.

Run Experiments with FAR

FedScale Automated Runtime (FAR), an automated and easily-deployable evaluation platform, to simplify and standardize the FL experimental setup and model evaluation under a practical setting. FAR is based on our Oort project, which has been shown to scale well and can emulate FL training of thousands of clients in each round.

Please go to ./core directory and follow the FAR README to set up FL training scripts.

Repo Structure

Repo Root
|---- dataset     # Realistic datasets in FedScale
|---- core        # Experiment platform of FedScale
    |---- examples  # Examples of new plugins
    |---- evals     # Backend of job submission

Notes

please consider to cite our paper if you use the code or data in your research project.

@inproceedings{fedscale-arxiv,
  title={FedScale: Benchmarking Model and System Performance of Federated Learning},
  author={Fan Lai and Yinwei Dai and Xiangfeng Zhu and Mosharaf Chowdhury},
  booktitle={arXiv:2105.11367},
  year={2021}
}

and

@inproceedings{oort-osdi21,
  title={Oort: Efficient Federated Learning via Guided Participant Selection},
  author={Fan Lai and Xiangfeng Zhu and Harsha V. Madhyastha and Mosharaf Chowdhury},
  booktitle={USENIX Symposium on Operating Systems Design and Implementation (OSDI)},
  year={2021}
}

Contact

Fan Lai ([email protected]), Yinwei Dai ([email protected]), Xiangfeng Zhu ([email protected]) and Mosharaf Chowdhury from the University of Michigan.

Comments

Android Aggregation and Execution Support
Why are these changes needed?

To support android on-device training and testing with MNN backend.

Related issue number

N/A

Checks

[x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/

[x] I've made sure the following tests are passing.

Testing Configurations

[x] Dry Run (20 training rounds & 1 evaluation round)

[ ] Cifar 10 (20 training rounds & 1 evaluation round)

[ ] Femnist (20 training rounds & 1 evaluation round)
opened by continue-revolution 31
[]

What happened + What you expected to happen

The training process sometimes crashes unexpectedly after the model evaluation (testing on the testing set).

Versions / Dependencies

OS: Linux (CloudLab FedScale 240-g5; 1 node) FedScale, Python, cuda, etc: installed by "install.sh --cuda" provided by FedScale.

Reproduction script

The conf.yml file I used. conf.yml.zip

Issue Severity

Low: It annoys or frustrates me.
bug

opened by Yunzhen-Liu 19
Reorg repo
Reorganize the repo into better structures. We expect no big changes in the near future;

Fix the SLOW installation, which is due to specifying too many random conda channels (not due to installing too many). It is much faster now;

Fix some legacy paths in docs;

Test method: Passed the dryrun, femnist and cifar quick run over 10+ rounds.
opened by fanlai0990 16
Install dataset-specific dependencies when downloading that dataset

This will make initial conda setup faster (basically, delete the dependency from the environment.yml), and people can avoid installing unnecessary packages.
enhancement

opened by mosharaf 14
ProcessGroupGloo error when running on more than one worker machine

Hi, I am trying to perform training based on the following config file for femnist dataset. I can run the experiment using two virtual machines. One as a parameter server and the other as a worker. However, if I increase the number of workers, let's say two workers, I run into the following error (please see the next comment).

Any thought on this?

opened by etesami 11
Fix Async
Why are these changes needed?

In async FedScale example, (i) training stalls after a while; (ii) API mismatch in Test;

Related issue number

Closes #148

Checks

[x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/

[x] I've made sure the following tests are passing.

Testing Configurations

[x] Dry Run (20 training rounds & 1 evaluation round)

[x] Cifar 10 (20 training rounds & 1 evaluation round)

[x] Femnist (20 training rounds & 1 evaluation round)
opened by fanlai0990 8
Inconsistency in the dataset directory
The README says 20 datasets, the download script has 16 or so, the data directory has 15 or 16.

Naming of the datasets are inconsistent too; e.g., iNature vs iNaturalist

Using a single letter in the download script is also confusing and short-sighted. There may be more datasets than letters in the alphabet. A convention would be --dataset-name

documentation enhancement
opened by mosharaf 8
Fix async
Why are these changes needed?

Model testing is somehow missing; 2. Weird model accuracy over training;

Related issue number

Checks

[ ] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/

[ ] I've made sure the following tests are passing.

Testing Configurations

[ ] Dry Run (20 training rounds & 1 evaluation round)

[ ] Cifar 10 (20 training rounds & 1 evaluation round)

[ ] Femnist (20 training rounds & 1 evaluation round)
opened by fanlai0990 7

[Core] Async aggregator freezes during evaluation

What happened + What you expected to happen

Hi fedscale team, I tried to run the async aggregator locally, but no test metrics are generated. The training seems to work fine, but the system freezes without any error at round 50.

Here are the last events from the aggregator:

(07-26) 11:38:52 INFO [async_aggregator.py:216] Wall clock: 2519 s, round: 49, Remaining participants: 5, Succeed participants: 10, Training loss: 4.433294297636379 (07-26) 11:38:55 INFO [async_aggregator.py:279] Client 2602 train on model 46 during 2274-2535.0060934242283 (07-26) 11:38:55 INFO [aggregator.py:812] Issue EVENT (client_train) to EXECUTOR (1) (07-26) 11:38:55 INFO [aggregator.py:812] Issue EVENT (update_model) to EXECUTOR (1) (07-26) 11:38:56 INFO [async_aggregator.py:279] Client 2667 train on model 46 during 2319-2539.592434184604 (07-26) 11:38:56 INFO [aggregator.py:812] Issue EVENT (client_train) to EXECUTOR (1) (07-26) 11:38:56 INFO [async_aggregator.py:279] Client 2683 train on model 46 during 2328-2542.9932767611217 (07-26) 11:38:56 INFO [aggregator.py:812] Issue EVENT (client_train) to EXECUTOR (1) (07-26) 11:38:59 INFO [async_aggregator.py:279] Client 2569 train on model 45 during 2253-2605.669321587796 (07-26) 11:38:59 INFO [aggregator.py:812] Issue EVENT (client_train) to EXECUTOR (2) (07-26) 11:38:59 INFO [aggregator.py:812] Issue EVENT (update_model) to EXECUTOR (2) (07-26) 11:39:01 INFO [async_aggregator.py:279] Client 2769 train on model 47 during 2385-2680.206093424228 (07-26) 11:39:01 INFO [aggregator.py:812] Issue EVENT (client_train) to EXECUTOR (2)

Here's the tail of the executor log:

oving_loss': 4.510447650271058, 'trained_size': 100, 'success': True, 'utility': 752.3330107862802} (07-26) 11:39:00 INFO [client.py:32] Start to train (CLIENT: 2569) ... (07-26) 11:39:01 INFO [client.py:68] Training of (CLIENT: 2569) completes, {'clientId': 2569, 'moving_loss': 4.526119316819311, 'trained_size': 100, 'success': True, 'utility': 729.5144631284894} (07-26) 11:39:01 INFO [client.py:32] Start to train (CLIENT: 2769) ... (07-26) 11:39:02 INFO [client.py:68] Training of (CLIENT: 2769) completes, {'clientId': 2769, 'moving_loss': 4.5834765700435645, 'trained_size': 100, 'success': True, 'utility': 692.4210048353054} (07-26) 11:39:04 INFO [client.py:68] Training of (CLIENT: 2667) completes, {'clientId': 2667, 'moving_loss': 4.169509475803674, 'trained_size': 100, 'success': True, 'utility': 556.3458848955673}

Versions / Dependencies

Latest

Reproduction script

lHere's my config for the async_aggregator.py example:


# ip address of the parameter server (need 1 GPU process)
ps_ip: localhost

# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
    - localhost:[2]

exp_path: $FEDSCALE_HOME/fedscale/core

# Entry function of executor and aggregator under $exp_path
executor_entry: ../../examples/async_fl/async_executor.py

aggregator_entry: ../../examples/async_fl/async_aggregator.py

auth:
    ssh_user: ""
    ssh_private_key: ~/.ssh/id_rsa

# cmd to run before we can indeed run FAR (in order)
setup_commands:
    - source $HOME/anaconda3/bin/activate fedscale

# ========== Additional job configuration ==========
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found

job_conf:
    - job_name: asyncfl                   # Generate logs under this folder: log_path/job_name/time_stamp
    - log_path: $FEDSCALE_HOME/benchmark # Path of log files
    - num_participants: 800                      # Number of participants per round, we use K=100 in our paper, large K will be much slower
    - data_set: femnist                     # Dataset: openImg, google_speech, stackoverflow
    - data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist    # Path of the dataset
    - data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv              # Allocation of data to each client, turn to iid setting if not provided
    - device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity     # Path of the client trace
    - device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
    - model: shufflenet_v2_x2_0                            # Models: e.g., shufflenet_v2_x2_0, mobilenet_v2, resnet34, albert-base-v2
    - gradient_policy: yogi                 # {"fed-yogi", "fed-prox", "fed-avg"}, "fed-avg" by default
    - eval_interval: 5                     # How many rounds to run a testing on the testing set
    - rounds: 500                          # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
    - filter_less: 21                       # Remove clients w/ less than 21 samples
    - num_loaders: 2
    - yogi_eta: 3e-3
    - yogi_tau: 1e-8
    - local_steps: 5
    - learning_rate: 0.05
    - batch_size: 20
    - test_bsz: 20
    - malicious_factor: 4
    - use_cuda: False
    - decay_round: 50
    - overcommitment: 1.0
    - async_buffer: 10
    - checkin_period: 50
    - arrival_interval: 3

Issue Severity

No response

bug

opened by ewenw 7

Support k8s for job submission and management
Why are these changes needed?

Support using k8s to manage job lifecycles, including job submission, initialization, termination and clean-up.

TODO:

[x] add README for k8s job management tutorial

change in docker/driver.py is added to use k8s client apis for job management, now the driver will support "default", "docker" and "k8s" modes.

add a yaml generator for automating generation of k8s container configs.

add new example k8s configs in benchmark

Related issue number

Checks

[x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/

[x] I've made sure the following tests are passing.

Testing Configurations

k8s

[x] Dry Run (20 training rounds & 1 evaluation round)

[x] Cifar 10 (20 training rounds & 1 evaluation round)

[x] Femnist (20 training rounds & 1 evaluation round)

Regression 1: docker

[x] Cifar 10 (20 training rounds & 1 evaluation round)

[x] Femnist (20 training rounds & 1 evaluation round)

Regression 2: default

[x] Cifar 10 (20 training rounds & 1 evaluation round)

[x] Femnist (20 training rounds & 1 evaluation round)
opened by IKACE 6
Running FEMNIST tutorial on local machine gives a few warnings.
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)

/Users/mosharaf/opt/anaconda3/envs/fedscale/lib/python3.7/site-packages/torchvision/transforms/functional_pil.py:42: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead. return img.transpose(Image.FLIP_LEFT_RIGHT)

Training seems to continue.
bug wontfix
opened by mosharaf 6
Dataloader support for TF

Description

Hi Team, there is admittedly some overlap between this issue and a previous one. However I thought I would make a new one since the other is fairly old. I am looking for some data loader support in FedScale for TensorFlow. It seems the data classes as they stand are written using PyTorch in mind and I was wondering if anyone has any experience using a TensorFlow data set, particularly using .tfrecord files with a known schema.

Use case

I work in industry and am looking to use FedScale to run a simulation of a specific federated learning model on a specific set of hardware. All of our models are written using tf and keras.
enhancement

opened by kashprime 3

[Async simulation] Implementation idea for task scheduling

Description

Hi FedScale team, here's my suggestion on how to implement the async simulation mode using device traces without needing a constant arrival parameter (related to #162):

sort device traces by start time
queue = initialize min priority queue
while tasks_issued < buffer_size:
   event_time, event_type, client_id = queue.get()
   if event_type == 'start':
        current_concurrency += 1
        if current_concurrency < MAX_CONCURRENCY:
            issue_task(event_time)
    else:
        current_concurrency -= 1
        if current_concurrency == MAX_CONCURRENCY - 1:
            issue_task(event_time)

issue_task(event_time):
    client, trace_start, trace_end = sample next available client at event_time
    add client task to individual executor's queue
    queue.put((trace_start, 'start', client))
    queue.put((trace_end, 'end', client)

This works well in my implementation, but might be harder to integrate into fedscale, hence I'm creating an issue to document it. Let me know if you have any questions / concerns.

Below is the python code for this scheduling algorithm, feel free to run it and validate the output:

import random
from queue import PriorityQueue

id = 0


def generate_start_end(time):
    # next available client
    global id
    start_time = time + random.randint(0, 1)
    duration = random.randint(1, 3)
    id += 1
    return start_time, start_time + duration, id


min_pq = PriorityQueue()
total_tasks = 1

TOTAL_TASKS = 10
MAX_CONCURRENCY = 1
current_concurrency = 0
start_times = {}


def new_task(event_time):
    new_start, new_end, client_id = generate_start_end(event_time)
    min_pq.put((new_start, 'start', client_id))
    min_pq.put((new_end, 'end', client_id))
    start_times[client_id] = new_start


new_task(0)
while not min_pq.empty():
    event_time, event_type, client_id = min_pq.get()
    if event_type == 'start':
        current_concurrency += 1
        if total_tasks < TOTAL_TASKS and current_concurrency < MAX_CONCURRENCY:
            new_task(event_time)
            total_tasks += 1
    else:
        current_concurrency -= 1
        if total_tasks < TOTAL_TASKS and current_concurrency == MAX_CONCURRENCY - 1:
            new_task(event_time)
            total_tasks += 1
        print(f"processing event starting at {start_times[client_id]} and ending at {event_time}")

Use case

No response

enhancement

opened by ewenw 1

Redis Support for FedScale
Why are these changes needed?

To integrate Redis into aggregator for saving aggregation data.

Related issue number

N/A

Checks

[x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/

[x] I've made sure the following tests are passing.

Testing Configurations

[x] Dry Run (20 training rounds & 1 evaluation round)

[x] Cifar 10 (20 training rounds & 1 evaluation round)

[x] Femnist (20 training rounds & 1 evaluation round)

Note:

All tests are run on cpu.

Met with the following KeyError bug on the same line in Femnist once with Redis, and also once without Redis (i.e. original code). Sample error output:

File "/users/xuyehe/FedScale-rd/fedscale/core/aggregation/aggregator.py", line 386, in client_completion_handler duration=self.virtual_client_clock[results['clientId']]['computation'] + KeyError: 665
opened by xuyehe 2
Improve documentation in various components

Description

Some code comments and docstrings should be added, especially for resource_manager.py, data_loader.py, client_manager.py, etc. Also, some model diagram on what each component would be helpful for people new to the codebase.
documentation

opened by ewenw 1
[Dataloader] Fix Missing Model Configuration

What happened + What you expected to happen

Albert config is missing, leading to model failures. We should avoid providing such configurations by ourselves. This should be done automatically.

Versions / Dependencies

FedScale python folder.

Reproduction script

Try to submit nlp configs.

Issue Severity

Medium: It is a significant difficulty but I can work around it.
bug

opened by fanlai0990 0
Issues on FedScale and Oort: (1) widely promotes its so-called advantages that are not based new version of FedML (10 months outdated until today) (2) The evaluation of FedML old version (Oct, 2021) is not based on facts and overlapping with a published paper; (3) unrealistic overlap between system efficiency and data distribution (4) issues on numerical optimization (5) dual submission?
Dear Authors of FedScale,

I didn't want to comment too much on FedScale because I thought all the experts in the field knew the truth. But you promote your outdated paper for a long time without based on facts and your co-author (e.g., Jiachen) always publicly claim the inaccurate advantages of FedScale over FedML on the Internet, which is a deep harm and disrespect to FedML's past academic efforts and current industrialization efforts. Therefore, it is necessary for me to state some facts here, and let people know the truth.

Summary

FedScale paper have adopted the method of evaluating the old version of FedML during the ICML submission (3 months expired at the time of submission), and still did not mention it during Camera Ready (6 months expired) and during the conference speech period (about 10 months expired). The issue of comparing with FedML in an old version, and in this case, widely publicizing its so-called advantages that are not based on facts and new version, has brought a lot of harm and loss to FedML. Aside from the harm caused by the publication and publicity of the old version, even the comparison of the old version is academically inaccurate and wrong: there are 4 core arguments in the paper, 3 are not in line with the facts, and the fourth has highly overlaps with existing papers. In addition, the ICML paper substantially overlaps with a proceeding-based published paper at another workshop. Based on these issues, we think this paper violates the dual submission policy and does not meet the criteria for publication. We also hope FedScale team can update paper (https://arxiv.org/pdf/2105.11367v5.pdf) and media articles (Zhihu, etc.) in a timely manner, clarifying the above issues, avoiding misunderstandings among users and peers in the Chinese and English communities, and terminating unnecessary reputation damage .

Issue 1: FedScale widely promotes its so-called advantages that are not based new version (10 months outdated until today)

Your ICML 2022 paper uses a version 3.5 months before the submission deadline, 6 months before the review/rebuttal deadline (review open date). Reviewers should notice this issue. I believe the rebuttal date is much after our advanced feature release, not mentioning that you only compare with part of our code in an old version.

You promote your ICML 2022 paper on social media (e.g., Zhihu) without mentioning the version date and ID. The earliest date of this promotion is already 6 months outdated, compared to the date you compare with FedML. At that time, FedML already released new version with many advanced features. Your improper claim in the promotion raises too much misunderstanding and concern of FedML company, which invades our reputation a lot (friends and investors come to ask the issue).

The date you present your ICML 2022 in the main conference is already 10 months outdated. You promoted it at social media during that week. Unfortunately, you still didn't address the version ID issue. This further invades FedML reputation (we got concern messages from friends and users at that week). The fact is that we already released a lot of features. Even so, FedML team still kept silent and still believe people can tell the truth.

Your paper didn't mention the version in the main text. Until today, the version ID you mentioned in the appendix is a version that is already 10 months outdated.

https://arxiv.org/pdf/2105.11367v5.pdf - Table 1's comment on FedML is fully wrong and outdated.

My comments: It's surprising to many engineers and researchers at USC and FedML that you overclaim that your platform has stronger "Scalable Platform". Please check our platform at https://fedml.ai.

FedML AI platform releases the world’s federated learning open platform on the public cloud with an in-depth introduction of products and technologies! https://medium.com/@FedML/fedml-ai-platform-releases-the-worlds-federated-learning-open-platform-on-public-cloud-with-an-8024e68a70b6

Issues 2: The evaluation of FedML old version (Oct, 2021) is not based on facts. In addition, the core contribution of FedScale ICML paper (system and data heterogeneity) overlaps a published paper Oort. ICML reviewers should be aware of these issues.

Quote from https://arxiv.org/pdf/2105.11367v5.pdf: "First, they are limited in the versatility of data for various real-world FL applications. Indeed, even though they may have quite a few datasets and FL training tasks (e.g., LEAF (Caldas et al., 2019)), their datasets often contain synthetically generated partitions derived from conventional datasets (e.g., CIFAR) and do not represent realistic characteristics. This is because these benchmarks are mostly borrowed from traditional ML benchmarks (e.g., MLPerf (Mattson et al., 2020)) or designed for simulated FL environments like TensorFlow Federated (TFF) (tff) or PySyft (pys). Second, existing benchmarks often overlook system speed, connectivity, and availability of the clients (e.g., FedML (He et al., 2020) and Flower (Beutel et al., 2021)). This discourages FL efforts from considering system efficiency and leads to overly optimistic statistical performance (§2). Third, their datasets are primarily small-scale, because their experimental environments are unable to emulate large-scale FL deployments. While real FL often involves thousands of participants in each training round (Kairouz et al., 2021b; Yang et al., 2018), most existing benchmarking platforms can merely support the training of tens of participants per round. Finally, most of them lack user-friendly APIs for automated integration, resulting in great engineering efforts for benchmarking at scale"

These four core arguments are not based on facts:

The 1st argument (about dataset) is wrong and not in line with the fact and exsisting works. We already support a large number of datasets in 2020 that conform to the habits of the ICML/NeurIPS/ICLR community: https://doc.fedml.ai/simulation/user_guide/datasets-and-models.html, and slso supports real datasets (FedNLP, FedGraphNN, FedCV, FedIoT) contained in massive applications: https://github.com/FedML-AI/FedML/tree/master/python/app. The timelines for these works all predate October 2021. These works have been published in the workshops and main conferences of major conferences. It is important to note that these works were published 6 months earlier than the old version of FedML mentioned in the ICML paper, and basically more than half a year earlier than the ICML 2022 submission deadline.

FedGraphNN: FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks. https://arxiv.org/abs/2104.07145 (Arxiv Time: 4 Apr 2021) FedNLP: FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks. https://arxiv.org/abs/2104.08815 (Arxiv Time: 8 Apr 2021) FedCV: https://arxiv.org/abs/2111.11066 (Arxiv time: 22 Nov 2021) FedIoT: https://arxiv.org/abs/2106.07976v1 (Arxiv time: 15 Jun 2021)

the 3rd argument (unable to emulate large-scale FL deployments) is also not based on facts:

(1) https://arxiv.org/pdf/2105.11367v5.pdf - "FedML can only support 30 participants because of its suboptimal scalability, which under-reports the FL performance that the algorithm can indeed achieve"

My comments: This doesn't match the fact. From the oldest version of FedML, it always supports arbitrary number of clients training by using the single process (standalone in the old version) sequential training. In addition, our users can run parallel experiments (one GPU per job/run) with multiple GPUs to accelerate the hyperparameter tuning. This avoids communication cost in emulator level. In our latest version, it supports sequential training with multiple nodes via efficient scheduler. Therefore, such a comment does not match the fact.

(3) https://arxiv.org/pdf/2105.11367v5.pdf - "Third, their datasets are primarily small-scale, because their experimental environments are unable to emulate large-scale FL deployments."

My comments: this is also misleading to paper readers. In our old version, we already support many large-scale datasets for reseachers in ML community: https://doc.fedml.ai/simulation/user_guide/datasets-and-models.html. They are widely used by many ICML/NeurIPS/ICLR papers. Recently, our latest version even support many realistic and large-scale datasetes in CV, NLP, healthcare, graph neural networks, and IoT. See some links at: https://github.com/FedML-AI/FedML/tree/master/python/app https://github.com/FedML-AI/FedML/tree/master/iot Each one is supported by top-tier conference papers. For example, the NLP one (https://arxiv.org/abs/2104.08815) is connected to Huggingface and accepted to NACCL 2022.

the 4th argument (API not friendly) also does not respect FedML's works. We released FedNLP, FedGraphNN, FedCV, FedIoT and other application frameworks as early as 1.5 years ago (https://open.fedml.ai/platform/appStore), all based on the FedML core framework, after so many applications Verification has long proven its convenience. Regardless of these differences, the best way to prove "convenience" is the user data, which you can look at our GitHub stars, paper citations, platform user number, etc.

We also put together a brief introduction to APIs to see who is more convenient: https://medium.com/@FedML/fedml-releases-simple-and-flexible-apis-boosting-innovation-in-algorithm-and-system-optimization-b21c2f4b88c8

Regarding 2nd key argument, we think it has been mentioned in another paper Oort (highly overlapping, please compare the two papers; Oort is here: https://arxiv.org/abs/2010.06081), which does not belong to the spirit of ICML that requires independent contribution and novelty of a published paper. Specifically, system heterogeneity (system speed, connectivity and availability) has been described in Section 2.2 of Oort's original paper, and also clearly mentioned in Section 7.1 of the experimental section. System speed, connectivity and availability are the same things as Section 3.2 in the original FedScale article. Oort says: We simulate real-world heterogeneous client system performance and data in both training and testing evaluations using an open-source FL benchmark [48]: (1) Heterogeneous device runtimes (speed) of different models, network throughput/connectivity (connectivity), device model, and availability are emulated using data from AI Benchmark [1] and Network Measurements on mobiles [6].

Issues 3: issues in FedScale and Oort: unrealistic overlap between system speed, data distribution, and client device availability

Quote from https://arxiv.org/pdf/2105.11367v5.pdf: "Second, existing benchmarks often overlook system speed, connectivity, and availability of the clients (e.g., FedML (He et al., 2020) and Flower (Beutel et al., 2021)). This discourages FL efforts from considering system efficiency and leads to overly optimistic statistical performance (§2)."

My comments: this is misleading. My question is "how can you match a realistic overlapping of system speed, data distribution statistics, and client device availability?" You get them from three independent databases, which does not match the practice. Then you build Oort based on this unrealistic assumption. FedScale team never clearly answers this question. This benchmark definitely brings issues in numerical optimization theory. We ML and System researchers do not hope this misleading benchmark to misguide the research in ML area.

Moreover, such a comment ("existing benchmarks often overlook system speed, connectivity, and availability of the clients") is extremely disrespectful to the work of an industrialized team who has expertise more than this. Distributed system is the hardcore area that FedML engineering team focuses on. Maybe your team only reads part of the materials (white paper? or part of our source code?). Please refer to a comprehensive material list here:

FedML Homepage: https://fedml.ai/ FedML Open Source: https://github.com/FedML-AI FedML Platform: https://open.fedml.ai FedML Use Cases: https://open.fedml.ai/platform/appStore FedML Documentation: https://doc.fedml.ai FedML Research: https://fedml.ai/research-papers/ (50+ papers covering many aspects including security and privacy)

Issues 4: FedScale only supports running on the same number of iteration locally, however, many ICML/NeurIPS/ICLR papers (almost all) are working on the same number of epochs. This differs from the entire ML community significantly.

https://github.com/SymbioticLab/FedScale/blob/51cc4a1e0ab553cd79ecb59af211008788f1af39/fedscale/core/execution/client.py#L50

Issue 5: We suspect that FedScale ICML paper violates the dual submission policy in ML community

The FedScale ICML version (ICML proceeding https://proceedings.mlr.press/v162/lai22a/lai22a.pdf) overlaps substantially with a workshop paper with proceeding (https://dl.acm.org/doi/10.1145/3477114.3488760). The workshop date is October 2021, at least 3 months earlier than the ICML 2022 submission deadline. Normally, ICML/NeurIPS/ICLR do not allow submissions that are already published somewhere else with proceeding using the same title/author/core contribution.

(1) These two papers have the same title "FedScale: Benchmarking Model and System Performance of Federated Learning at Scale". (2) These two papers have 5 authors overlapping Workshop authors: Fan Lai, Yinwei Dai, Xiangfeng Zhu, Harsha V. Madhyastha, Mosharaf Chowdhury ICML authors: Fan Lai, Yinwei Dai, Sanjay S. Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha V. Madhyastha, Mosharaf Chowdhury (two authors are added in the ICML version)

(3) substantial contribution and core argument overlapping. See the two key paragraphs in these two papers.
Note: these two papers are talking about the same arguments with the same wording.
ICML policy: https://icml.cc/Conferences/2022/StyleAuthorInstructions

As mentioned in issue 2, FedScale ICML 2022 paper also overlaps a key contribution with another published paper at OSDI 2021:

The 2nd key argument has been mentioned in another paper Oort (highly overlapping, please compare the two papers; Oort is here: https://arxiv.org/abs/2010.06081), which does not belong to the spirit of ICML that requires independent contribution and novelty of a published paper.

Specifically, system heterogeneity (system speed, connectivity and availability) has been described in Section 2.2 of Oort's original paper, and also clearly mentioned in Section 7.1 of the experimental section. System speed, connectivity and availability are the same things as Section 3.2 in the original FedScale article. Oort says: We simulate real-world heterogeneous client system performance and data in both training and testing evaluations using an open-source FL benchmark [48]: (1) Heterogeneous device runtimes (speed) of different models, network throughput/connectivity (connectivity), device model, and availability are emulated using data from AI Benchmark [1] and Network Measurements on mobiles [6].

Versions / Dependencies

Code: https://github.com/SymbioticLab/FedScale (51cc4a1)

Paper: https://arxiv.org/pdf/2105.11367v5.pdf (v5)
help wanted
opened by chaoyanghe 5

Releases(v0.5)

v0.5(Jul 18, 2022)
FedScale 0.5 is the first major release of FedScale after years of development.

Major Features

Distributed/standalone fast-forward FL evaluations

21 realistic FL datasets

70+ lightweight FL models

PyTorch and TensorFlow support

GPU, x86, and ARM hardware backend support

Real-world client system traces

Synchronous and asynchronous training with straggler mitigation support

Homepage, API documentation

Credits

FedScale 0.5 was the work of a large set of new contributors from Michigan and outside. Thanks also to all the FedScale users who have suggested new features or reported bugs.
Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

161 Jan 03, 2023

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

49 Dec 21, 2022

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

GLOM An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this

50 Oct 19, 2022

Adaptive Attention Span for Reinforcement Learning

Adaptive Transformers in RL Official implementation of Adaptive Transformers in RL In this work we replicate several results from Stabilizing Transfor

100 Nov 15, 2022

Telegram chatbot created with deep learning model (LSTM) and telebot library.

Telegram chatbot Telegram chatbot created with deep learning model (LSTM) and telebot library. Description This program will allow you to create very

1 Jan 04, 2022

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

3 Sep 23, 2022

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

GeNER This repository provides the official code for GeNER (an automated dataset Generation framework for NER). Overview of GeNER GeNER allows you to

50 Nov 30, 2022

Bling's Object detection tool

BriVL for Building Applications This repo is used for illustrating how to build applications by using BriVL model. This repo is re-implemented from fo

47 Nov 01, 2022

A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

LegoNet This code is the implementation of ICML2019 paper LegoNet: Efficient Convolutional Neural Networks with Lego Filters Run python train.py You c

140 Sep 26, 2022

Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

0 Feb 25, 2022

Using deep learning model to detect breast cancer.

Breast-Cancer-Detection Breast cancer is the most frequent cancer among women, with around one in every 19 women at risk. The number of cases of breas

1 Feb 13, 2022

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

7 Jul 27, 2022

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

ODE GAN (Prototype) in PyTorch Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary

15 Feb 10, 2022

FedScale: Benchmarking Model and System Performance of Federated Learning

Related tags

Overview

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper)

Overview

Getting Started

Realistic FL Datasets

Run Experiments with FAR

Repo Structure

Notes

Contact

Comments

Why are these changes needed?

Related issue number

Checks

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Why are these changes needed?

Related issue number

Checks

Why are these changes needed?

Related issue number

Checks

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Why are these changes needed?

Related issue number

Checks

Description

Use case

Description

Use case

Why are these changes needed?

Related issue number

Checks

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Summary

Issue 1: FedScale widely promotes its so-called advantages that are not based new version (10 months outdated until today)

Issues 2: The evaluation of FedML old version (Oct, 2021) is not based on facts. In addition, the core contribution of FedScale ICML paper (system and data heterogeneity) overlaps a published paper Oort. ICML reviewers should be aware of these issues.

Issues 3: issues in FedScale and Oort: unrealistic overlap between system speed, data distribution, and client device availability

Issues 4: FedScale only supports running on the same number of iteration locally, however, many ICML/NeurIPS/ICLR papers (almost all) are working on the same number of epochs. This differs from the entire ML community significantly.

Issue 5: We suspect that FedScale ICML paper violates the dual submission policy in ML community

As mentioned in issue 2, FedScale ICML 2022 paper also overlaps a key contribution with another published paper at OSDI 2021:

Versions / Dependencies

Releases(v0.5)

v0.5(Jul 18, 2022)

Major Features

Credits

Owner

A lightweight deep network for fast and accurate optical flow estimation.

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Adaptive Attention Span for Reinforcement Learning

Telegram chatbot created with deep learning model (LSTM) and telebot library.

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

Bling's Object detection tool

A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Using deep learning model to detect breast cancer.

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

using STGCN to achieve egg classification task

Deep GPs built on top of TensorFlow/Keras and GPflow

Equivariant Imaging: Learning Beyond the Range Space

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

GPU-Accelerated Deep Learning Library in Python

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Pytorch implementation for RelTransformer