Plato: A New Framework for Federated Learning Research

Related tags

Deep Learningplato
Overview

Plato: A New Framework for Federated Learning Research

Welcome to Plato, a new software framework to facilitate scalable federated learning research.

Installing Plato with PyTorch

To install Plato, first clone this repository to the desired directory.

The Plato developers recommend using Miniconda to manage Python packages. Before using Plato, first install Miniconda, update your conda environment, and then create a new conda environment with Python 3.8 using the command:

$ conda update conda
$ conda create -n federated python=3.8
$ conda activate federated

where federated is the preferred name of your new environment.

Update any packages, if necessary by typing y to proceed.

The next step is to install the required Python packages. PyTorch should be installed following the advice of its getting started website. The typical command in Linux with CUDA GPU support, for example, would be:

$ conda install pytorch torchvision cudatoolkit=11.1 -c pytorch

The CUDA version, used in the command above, can be obtained on Ubuntu Linux systems by using the command:

nvidia-smi

In macOS (without GPU support), the typical command would be:

$ conda install pytorch torchvision -c pytorch

We will need to install several packages using pip as well:

$ pip install -r requirements.txt

If you use Visual Studio Code, it is possible to use yapf to reformat the code every time it is saved by adding the following settings to ..vscode/settings.json:

"python.formatting.provider": "yapf", 
"editor.formatOnSave": true

In general, the following is the recommended starting point for .vscode/settings.json:

"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "yapf", 
"editor.formatOnSave": true,
"python.linting.pylintArgs": [
    "--init-hook",
    "import sys; sys.path.append('/absolute/path/to/project/home/directory')"
],
"workbench.editor.enablePreview": false

It goes without saying that /absolute/path/to/project/home/directory should be replaced with the actual path in the specific development environment.

Tip: When working in Visual Studio Code as the development environment, one of the project developer's colour theme favourites is called Bluloco, both of its light and dark variants are excellent and very thoughtfully designed. The Pylance extension is also strongly recommended, which represents Microsoft's modern language server for Python.

Running Plato in a Docker container

Most of the codebase in Plato is designed to be framework-agnostic, so that it is relatively straightfoward to use Plato with a variety of deep learning frameworks beyond PyTorch, which is the default framwork it is using. One example of such deep learning frameworks that Plato currently supports is MindSpore. Due to the wide variety of tricks that need to be followed correctly for running Plato without Docker, it is strongly recommended to run Plato in a Docker container, on either a CPU-only or a GPU-enabled server.

To build such a Docker image, use the provided Dockerfile for PyTorch and Dockerfile_MindSpore for MindSpore:

docker build -t plato -f Dockerfile .

or:

docker build -t plato -f Dockerfile_MindSpore .

To run the docker image that was just built, use the command:

./dockerrun.sh

Or if GPUs are available, use the command:

./dockerrun_gpu.sh

To remove all the containers after they are run, use the command:

docker rm $(docker ps -a -q)

To remove the plato Docker image, use the command:

docker rmi plato

On Ubuntu Linux, you may need to add sudo before these docker commands.

The provided Dockerfile helps to build a Docker image running Ubuntu 20.04, with a virtual environment called federated pre-configured to support PyTorch 1.8.1 and Python 3.8. If MindSpore support is needed, the provided Dockerfile_MindSpore contains a pre-configured environment, also called federated, that supports MindSpore 1.1.1 and Python 3.7.5 (which is the Python version that MindSpore requires). Both Dockerfiles have GPU support enabled. Once an image is built and a Docker container is running, one can use Visual Studio Code to connect to it and start development within the container.

Running Plato

To start a federated learning training workload, run run from the repository's root directory. For example:

./run --config=configs/MNIST/fedavg_lenet5.yml
  • --config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.
  • --log (-l): the level of logging information to be written to the console. Possible values are critical, error, warn, info, and debug, and the default is info.

Plato uses the YAML format for its configuration files to manage the runtime configuration parameters. Example configuration files have been provided in the configs directory.

Plato uses wandb to produce and collect logs in the cloud. If this is not needed, run the command wandb offline before running Plato.

If there are issues in the code that prevented it from running to completion, there could be running processes from previous runs. Use the command pkill python to terminate them so that there will not be CUDA errors in the upcoming run.

Installing YOLOv5 as a Python package

If object detection using the YOLOv5 model and any of the COCO datasets is needed, it is required to install YOLOv5 as a Python package first:

cd packages/yolov5
pip install .

Plotting Runtime Results

If the configuration file contains a results section, the selected performance metrics, such as accuracy, will be saved in a .csv file in the results/ directory. By default, the results/ directory is under the path to the used configuration file, but it can be easily changed by modifying Config.result_dir in config.py.

As .csv files, these results can be used however one wishes; an example Python program, called plot.py, plots the necessary figures and saves them as PDF files. To run this program:

python plot.py --config=config.yml
  • --config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.

Running Unit Tests

All unit tests are in the tests/ directory. These tests are designed to be standalone and executed separately. For example, the command python lr_schedule_tests.py runs the unit tests for learning rate schedules.

Installing Plato with MindSpore

Though we provided a Dockerfile for building a Docker container that supports MindSpore 1.1, in rare cases it may still be necessary to install Plato with MindSpore in a GPU server running Ubuntu Linux 18.04 (which MindSpore requires). Similar to a PyTorch installation, we need to first create a new environment with Python 3.7.5 (which MindSpore 1.1 requires), and then install the required packages:

conda create -n mindspore python=3.7.5
pip install -r requirements.txt

We should now install MindSpore 1.1 with the following command:

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.1.1/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.1.1-cp37-cp37m-linux_x86_64.whl

MindSpore may need additional packages that need to be installed if they do not exist:

sudo apt-get install libssl-dev
sudo apt-get install build-essential

If CuDNN has not yet been installed, it needs to be installed with the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8=8.0.5.39-1+cuda10.1

To check the current CuDNN version, the following commands are helpful:

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn

To check if MindSpore is correctly installed on the GPU server, try to import mindspore with a Python interpreter.

Finally, to use trainers and servers based on MindSpore, assign true to use_mindspore in the trainer section of the configuration file. This variable is unassigned by default, and Plato would use PyTorch as its default framework.

Uninstalling Plato

Remove the conda environment used to run Plato first, and then remove the directory containing Plato's git repository.

conda-env remove -n federated
rm -rf plato/

where federated (or mindspore) is the name of the conda environment that Plato runs in.

For more specific documentation on how Plato can be run on GPU cluster environments such as Lambda Labs' GPU cloud or Compute Canada, refer to docs/Running.md.

Technical support

Technical support questions should be directed to the maintainer of this software framework: Baochun Li ([email protected]).

Comments
  • Unifying data transfer with numpy array

    Unifying data transfer with numpy array

    All data transfer are now in numpy array.

    Description

    For model weights, the transfer type is an OrderedDict{name: numpy.nparray} For features, the transfer type is an list[(numpy.nparray, numpy.nparray)], first value is feature while second value is target.

    How has this been tested?

    Tested with the following config:

    'configs/MNIST/fedavg_lenet5_noniid'
    'configs/MNIST/fedavg_lenet5'
    'configs/MNIST/fedprox_lenet5'
    'configs/MNIST/mistnet_lenet5'
    'configs/MNIST/mistnet_pretrain_lenet5'
    

    Please help test for mindspore and Tensorflow. I don't have a proper machine for testing for now.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by hcngac 19
  • Update to support simulation of different client's speed in async mode

    Update to support simulation of different client's speed in async mode

    Description

    In the async mode, most of the clients have a relatively fast speed, so it is sometimes quite hard to test Plato in a scenario where clients have various different speeds. This change allows users to simulate clients' speeds by providing a distribution in the configuration file. The user can also choose to only enable the simulation without providing a specific distribution, and the code would just use a default one.

    Currently, the simulation only supports Zipf and Normal distribution. More distributions can be added in the future.

    How has this been tested?

    • Test 1 with default distribution: Run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml
    • Test 2 with Normal distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:
    clients:
        type: simple
    
        total_clients: 2
    
        per_round: 2
    
        do_test: false
    
        simulation: true
    
        simulation_distribution:
            distribution: normal
            mean: 2
            sd: 1
    

    Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

    • Test 3 with Zipf distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:
    clients:
        type: simple
    
        total_clients: 2
    
        per_round: 2
    
        do_test: false
    
        simulation: true
    
        simulation_distribution:
            distribution: zipf
            s: 2
    

    Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.

    Additional information

    The current implementation deviates a bit from the original intent. The original intents were to put clients to sleep at the end of each epoch. To accomplish that, the await asyncio.sleep() code should be inserted in method Trainer.train_process() in file plato/trainers/basic.py, and that would require the method Trainer.train_process() to be changed to async, and we need to change every function that calls train_process() to async. I'm afraid that might break the code, so I just stay with the current implementation, where the clients are put to sleep after they finished the model training.

    UPDATE: please ignore the information above, the code is now implemented in a way such that clients are put to sleep at the end of each epoch. Please refer to the conversation below for details.

    opened by cuiboyuan 16
  • Add Support for FEMNIST

    Add Support for FEMNIST

    Add support for the FEMNIST dataset by referring to an open-sourced project LEAF.

    Description

    Main changes:

    1. Added a new datasource femnist at ~/plato/datasources/femnist.py and modified the ~/plato/datasources/registry.py accordingly.
    2. Added a new sampler empty at ~/plato/samplers/empty.py and modified the ~/plato/samplers/registry.py accordingly.
    3. Made minor changes at ~/plato/trainers/basic.py and ~/plato/models/lenet5.py for further adaptation.

    Remark: while the implementation is mainly borrowed from LEAF, it does not need to plug in the LEAF project and can work independently.

    Motivation

    Apart from the label distribution skew (that can be implemented with LDA), non-IID scenarios also consist of other circumstances including (1) feature distribution skew, (2) same label with different features, as well as (3) same feature with different labels (see this survey for more details). Thus, it would be useful if Plato can supports FL research with more realistic datasets. The FEMNIST dataset is one celebrated example, and it is inherently partitioned by the clients' identification. We thus considered adding support for it, and hopefully, our design can be compatible with those realistic datasets that are also partitioned by clients.

    p.s. One may want to refer to an external tutorial in a forked version of Plato for more context.

    How has this been tested?

    At the root directory,

    conda activate federated
    python run --config=./examples/async/data_hetero/data_hetero_femnist_lenet5.yml > out.txt 2>&1 &
    

    p.s. Please expect hours for the first test in your environment due to the data preprocessing overhead.

    In our test, we observed the generated out.txt and confirmed that the training can go on smoothly.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by SamuelGong 14
  • [FR] Local Differential Privacy Methods

    [FR] Local Differential Privacy Methods

    Is your feature request related to a problem? Please describe. Currently there is only one implementation of local differential privacy (LDP): RAPPOR[1], implemented in https://github.com/TL-System/plato/blob/main/plato/utils/unary_encoding.py and it is not decoupled with algorithm implementation.

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mistnet.py#L52-L64

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mindspore/mistnet.py#L44-L48

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/examples/nnrt/nnrt_algorithms/mistnet.py#L60-L65

    This feature request calls for a modular LDP plugin interface and a number of different other methods e.g. [2][3]

    Describe the solution you'd like

    • [x] ~~Unified data exchange format between clients and server.~~
    • [x] A modular interface for plugging in data processing modules into the server-client data exchange.
    • [x] A config entry for enabling specific data processing modules.
    • [ ] LDP modules implementation.
    • [ ] Test on the theoretical property of modules i.e. ε-LDP

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. To be filled.

    Additional context Add any other context or screenshots about the feature request here. [1] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014. [2] Differential Privacy Team, Apple. Learning with privacy at scale. 2017. [3] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, December 2017.

    enhancement 
    opened by hcngac 12
  • [RFC]Android Clients

    [RFC]Android Clients

    Development of android FL client to further enhance the simulation of FL on mobile devices

    Approach

    • Use of chaquo to adapt the current Python code base to Android.
      • Chaquo is not open source, but it provides free license for open source projects.
      • Chaquo is the only Python to Android tool that has PyTorch packaged.
      • Building PyTorch for Android in other tools require significant amount of work.
    • Use of redroid to support multiple instances of android devices.
      • Redroid is Android in container, using the same kernel as the host.
      • The performance of Redroid is close to the host, making multiple Android instances possible.
    • Separate log server to receive log entries from android clients.
      • There is no good way to directly extract log contents from Android app.
      • Using an HTTP log server and modifying the logging handler in clients can handle the logs nicely.
    enhancement 
    opened by hcngac 10
  • Added General Support for Asynchronous Training

    Added General Support for Asynchronous Training

    Add support for asynchronous FL where the central server can eagerly start the training of idle clients before receiving sufficient model updates.

    Motivation

    The FL practice is currently dominated by the synchronous mode, wherein each round the server needs to wait until receiving a sufficient number of clients' updates prior to deriving an aggregated model update (an example in Plato can be found in the method Server.client_payload_done() of ~/plato/servers/base.py). On the other hand, asynchronous mode has been extensively studied in traditional distributed learning (where the data distribution across clients is IID). In asynchronous training, the server eagerly starts the training of idle clients before receiving sufficient model updates sent by previously selected clients (an example is illustrated in the following figure). Out of curiosity, we want to explore the spectrum of the system performance (instead of theoretical convergence rate as in existing work) of asynchronous mode in the context of FL under varying degrees of client heterogeneity.

    Description

    1. Added a module ~/servers/async_timer.py, which acts as a virtual client at the server-side and plays the role of sending heartbeats for periodically triggered client selection.
    2. Added a module ~/servers/async_base.py, the base class of the respective server, which mainly implements the workflow of an asynchronous step:
    3. Added a module ~/servers/async_fedavg.py, where the aggregation logic is simply performing FedAvg on unaggregated weights. It implies that we can have other implementations of aggregation even in asynchronous mode.

    How has this been tested?

    We test it in a fresh clone like

    git clone [email protected]:SamuelGong/plato.git
    cd plato
    [with all necessary installation steps]
    conda activate federated
    python run --config=./examples/async/async_train/async_train_mnist_lenet5.yml > log.txt 2>&1 &
    

    Example results

    Time-to-accuracy performance w.r.t. the provided configuration can be seen as follows,

    while the corresponding time sequence diagram is also depicted for providing more insights.

    Context

    This is our preliminary attempt and we would like to hear from the authors in an agile manner. Thus, we still anticipate necessary changes on the code, let alone the coding styles, comments, and documentation (though they should be already easy to read at the moment). More context can be found in an external tutorial.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by SamuelGong 10
  • Feature/client sim

    Feature/client sim

    Some details: On server side (in plato/servers/base.py):

    • The (actual / launched) client_id is paired with sid
    • Each launched client has an attribute virtual_id:
      • Not in simulation: equal to the actual client_id
      • In simulation: updated every round after the server selects clients from self.client_pool
    • The server selects clients from self.client_pool instead of self.clients:
      • Not in simulation: a list of connected clients' ids updated with self.clients
      • In simulation: a list of all possible clients' ids according to config parameter total_clients

    On client side (in plato/clients/base.py and plato/clients/simple.py):

    • The client_id is the virtual one designated by the server updated each round
    • The actual_client_id is paired with sid used for connection
    • The client will update the trainer and algorithm used in this round whenever it receives a response from the server with new designated virtual_id

    Status:

    • Tests regarding client simulation passed for several examples (FedAvg, FedAtt, FedAdp, AFL).
    • Conflicts in example server or client were solved as much as possible.
    • README.md or other documents haven't been updated with this new feature yet.

    Potential Concerns:

    • Logging info might be confusing: In client simulation, the id of a new contact sent to the server is still the actual (launched) client id even though the client may represent a virtual one with a different virtual id in the last round.
    • One should be careful with self.selected_clients, self.clients_pool, self.client_id, self.virtual_id when designing example servers and self.client_id, self.actual_client_id when designing example clients.
    opened by silviafeiwang 8
  • Enable Oort working in the async mode.

    Enable Oort working in the async mode.

    Description

    Previously, the implementation of Oort cannot work normally in asynchronous mode since the server updates client utility according to the 'self.explored_client' list which contains delayed clients that haven't sent out the updates. To address it, we make the server update based on the update list.

    How has this been tested?

    Ran 'oort_MNIST_lenet5.yml' in asynchronous environment.

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [ x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by Yufei-Kang 7
  • Added the facility to record local testing accuracies in .csv files

    Added the facility to record local testing accuracies in .csv files

    Description

    When running a job, if the configuration file has the attribute "do_test" set to true and if there also exists a "results" attribute, then the test accuracies of each client will be computed locally and stored in a csv file with the round number, client ID and test accuracy as headers.

    How has this been tested?

    Tested on local machine using the "fedavg_async_lenet5" configuration file with 1-3 clients. Each with 1-3 rounds after setting "do_test" to True and another time to False

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by kevinsun203 7
  • RLFL: A Reinforcement Learning Framework for Active Federated Learning

    RLFL: A Reinforcement Learning Framework for Active Federated Learning

    This implements a reinforcement learning framework for learning and controlling federated learning tasks

    Description

    The added directory plato/utils/rlfl is the framework base; the added directory examples/fei is an instance of a DRL agent that learns the global aggregation strategy.

    How has this been tested?

    Tests of the instance examples/fei are passed on the latest plato environment.

    To know how to customize another DRL agent and run the training/testing, please refer to plato/utils/rlfl/README.md.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by silviafeiwang 7
  • Fixed the reported bug of the config test in #190

    Fixed the reported bug of the config test in #190

    Noticing the bug reported in #190, I then fixed all issues that existed in the tests/config_tests.py.

    Description

    I made three changes to the code. First, I moved the configuration files, including Pipelines and Models, into the Kinetics directory. This fixed the issue of "FileNotFoundError". Second, all PyLint errors were addressed, making the code rate 10.00/10. Finally, I added more comments further to describe the objective of each unit test function.

    How has this been tested?

    1. config_tests As for the code running, I tested it with python tests/config_tests.py. As for the format test, I executed the command pylint tests/config_tests.py.

    2. data_tests As for the code running, I tested it with python tests/data_tests.py. As for the format test, I executed the command pylint tests/data_tests.py.

    3. sampler_tests As for the code running, I tested it with python tests/sampler_tests.py. As for the format test, I executed the command pylint tests/sampler_tests.py.

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 6
  • Added the search space MobileNetV3 into example code of PerFedRLNAS

    Added the search space MobileNetV3 into example code of PerFedRLNAS

    Description

    This PR has two contributions. First, the nasvit space is moved to the plato/models as it has been tested through experiments and confirmed that no big changes will be added over nasvit space. Regarding other search space, they can also inherit part of the code from nasvit.

    Second, I added another search space mobilenetv3 on basis of previous codes. As nasvit has some basic units of linear layers, convolution layers, residual blocks for NAS supernet. It is easy to build this search space on the basis of nasvit code. The detailed code of implementation of search space mobilenetv3 in under examples/pfedrlnas/MobileNetV3/model. The idea of how to build this search space refers to the paper Searchiing for MobileNetV3.

    How has this been tested?

    To test the mobilenetv3 search space, we can test by running the command:

    python3 ./examples/pfedrlnas/MobileNetV3/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_Mobilenet_NonIID03_Scratch.yml
    

    To test if the nasvit is moved to plato/models correct and the search space NASVIT, we can run the command:

    python3 ./examples/pfedrlnas/VIT/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_NASVIT_NonIID01_Scratch.yml
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by dixiyao 1
  • Added a new fedavg algorithm supporting aggregating partial sub-modules of one model

    Added a new fedavg algorithm supporting aggregating partial sub-modules of one model

    This PR implements a new, perhaps enhanced, FedAvg algorithm for Plato to support extracting and aggregating partial sub-modules of one defined model.

    Description

    In many learning setups, only part of the model is used as the global model to be exchanged between server and client. For instance, after defining a ResNet model, its fully convolutional neural network will be utilized as the global model, while the fully-connected part will remain locally.

    To achieve the aforementioned feature, this PR inherits from Plato's conventional FedAvg algorithm and boosts the extract_weights function. Besides, there are also some necessary functions to support a wider range of applications.

    With the new FedAvg, which parts of the model are utilized as the global model can be set by the hyper-parameter named global_submodules_name whose format should be: {submodule1_prefix}__{submodule2_prefix}__{submodule3_prefix}__... where names for different submodules are separated by two consecutive underscores.

    How has this been tested?

    This PR can be tested through the unit test called fedavg_tests.py under the folder tests/.

    To run the test, you have to first switch to Plato's root folder. And, then you can run:

    [email protected]:~$ python tests/fedavg_tests.py
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 2
  • Supported a more general way of checkpoint operations

    Supported a more general way of checkpoint operations

    This PR implements multiple checkpoint operations, which can be utilized directly.

    Description

    Plato should contain sufficient checkpoint operations. Based on these operations, the checkpoint can be saved, loaded, or operated based on desired requirements.

    Therefore, this PR mainly includes three types of operations:

    1. Saving
    2. Loading
    3. Checkpoint searching, such as searching for the latest checkpoint

    Additionally, the code to generate a consistent filename is implemented to make all filenames of Plato share the same format.

    How has this been tested?

    As the code under this does not influence existing Plato's examples, the only way utilized to test the implementation is the unit test checkpoint_tests.py placed under tests/ folder of Plato.

    To run the test, you must first switch to Plato's root folder. And, then you can run:

    [email protected]:~$ python tests/checkpoint_tests.py
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 1
  • Added more visual data augmentations

    Added more visual data augmentations

    This PR introduces more data augmentations for the visual images.

    Description

    When implementing other methods, such as self-supervised learning (SSL), under the components of Plato, the datasource generally requires more additional and complex augmentations. One great example is that once the typical SSL method, called BYOL, is utilized for training the model in Plato, the input image should be processed to generate multi-view samples, each corresponding to one specific data augmentation.

    Currently, Plato's simple data augmentation method does not support this.

    To fill this gap, this PR is created to 1). add a more general way to create different data augmentations; 2). implement multiple visual transforms used in SSL; 3). collect normalizations for different datasets for clarity.

    How has this been tested?

    No test is needed as 1). the correctness of the code has been proved by work under the 'contrastive_adaptation' branch; 2). sufficient links are added in the comment to present the source and support of the implementation.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 1
Releases(v0.4.6)
  • v0.4.6(Dec 5, 2022)

  • v0.4.5(Oct 27, 2022)

    Improved client and server APIs; made client-side processors more customizable; added several examples showcasing how the APIs are to be used; various bug fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Aug 20, 2022)

    Redesigned the API for the server, trainer, and algorithm; Supported new documentation and its automated deployment; Redesigned some examples based on the new API.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Jul 20, 2022)

    Added more learning rate schedules from PyTorch; added approximate simulations of communication times; revised quantization processors; revised the way of using custom models, datasources, trainers, and algorithms; many bug fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Jun 1, 2022)

  • v0.4.1(May 20, 2022)

    Fixed several important issues related to client-side samplers, loading custom algorithms, federated unlearning, and added default values for configurations.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(May 14, 2022)

    Supported running an FL session on multiple GPUs, and further improved scalability in memory usage by always launching a constant number of client processes regardless of the number of clients selected per round. Made client simulation mode the default and only mode of operation.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.9(May 2, 2022)

  • v0.3.8(Apr 30, 2022)

  • v0.3.7(Feb 20, 2022)

    Added support for HuggingFace Language Modelling models and datasets, reinforcement learning servers, simulating client/server communication, measuring communication time, additional examples using the asynchronous mode, and removed wandb usage.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.6(Feb 4, 2022)

  • v0.3.5(Jan 28, 2022)

  • v0.3.4(Jan 23, 2022)

    Added several multi-modal data sources, and supported simulating the wall clock time in asynchronous mode, when the clients on the same physical machine are training in small batches (controlled by trainer -> max_concurrency) due to insufficient GPU memory.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.3(Dec 30, 2021)

    Added support for differentially private training on the client side, fixed issues related to cross-silo training, and added basic support for asynchronous training with bounded staleness.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.2(Dec 9, 2021)

Owner
System [email protected] Lab
System <a href=[email protected] Lab">
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
Framework web SnakeServer.

SnakeServer - Framework Web 🐍 Documentação oficial do framework SnakeServer. Conteúdo Sobre Como contribuir Enviar relatórios de segurança Pull reque

Jaedson Silva 0 Jul 21, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022
LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

LocUNet LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs. The method utilizes accura

4 Oct 05, 2022
Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment

GAN-Supervised Dense Visual Alignment — Official PyTorch Implementation Paper | Project Page | Video This repo contains training, evaluation and visua

944 Jan 07, 2023
SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

Kentaro Wada 49 Oct 24, 2022
This repository contains the source codes for the paper AtlasNet V2 - Learning Elementary Structures.

AtlasNet V2 - Learning Elementary Structures This work was build upon Thibault Groueix's AtlasNet and 3D-CODED projects. (you might want to have a loo

Théo Deprelle 123 Nov 11, 2022
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022
Code accompanying the paper on "An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers" published at NeurIPS, 2021

Code for "An Empirical Investigation of Domian Generalization with Empirical Risk Minimizers" (NeurIPS 2021) Motivation and Introduction Domain Genera

Meta Research 15 Dec 27, 2022
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
Predicting Event Memorability from Contextual Visual Semantics

Predicting Event Memorability from Contextual Visual Semantics

0 Oct 06, 2021
Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Info This is the code repository of the work Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation from Elias T

2 Apr 20, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 284 Dec 23, 2022
A python-image-classification web application project, written in Python and served through the Flask Microframework

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and

Gerald Maduabuchi 19 Dec 12, 2022
A Haskell kernel for IPython.

IHaskell You can now try IHaskell directly in your browser at CoCalc or mybinder.org. Alternatively, watch a talk and demo showing off IHaskell featur

Andrew Gibiansky 2.4k Dec 29, 2022
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

Benedek Rozemberczki 303 Dec 09, 2022
Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization Improves Vision Transformers This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision T

Wenqi Shao 20 Oct 09, 2022