Azure MLOps (v2) solution accelerators.

Overview

Azure MLOps (v2) solution accelerator

Header

Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting point for MLOps implementation in Azure.

MLOps is a set of repeatable, automated, and collaborative workflows with best practices that empower teams of ML professionals to quickly and easily get their machine learning models deployed into production. You can learn more about MLOps here:

Prerequisites

  1. An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.

Project overview

The solution accelerator provides a modular end-to-end approach for MLOps in Azure based on pattern architectures. As each organization is unique, solutions will often need to be customized to fit the organization's needs.

The solution accelerator goals are:

  • Simplicity
  • Modularity
  • Repeatability
  • Collaboration
  • Enterprise readiness

It accomplishes these goals with a template-based approach for end-to-end data science, driving operational efficiency at each stage. You should be able to get up and running with the solution accelerator in a few hours.

👤 Getting started: Azure Machine Learning - classical machine learning demo

The demo follows the classical machine learning pattern with Azure Machine Learning.

AzureML CML

‼️ Please follow the instructions to execute the demo accordingly: Quickstart ‼️

‼️ Please submit any issues here: Issues ‼️

📐 Pattern Architectures: Key concepts

Link AI Pattern
Pattern AzureML CML Azure Machine Learning - Classical Machine Learning
Pattern AzureML CV Azure Machine Learning - Computer Vision
[TBD] Azure Machine Learning - Natural Language Processing
[TBD] Azure Machine Learning / Azure Databricks - Classical Machine Learning
[TBD] Azure Machine Learning / Azure Databricks - Computer Vision
[TBD] Azure Machine Learning / Azure Databricks - Natural Language Processing
[TBD] Azure Machine Learning - Edge AI

📯 (Coming Soon) One-click deployments

📯 MLOps infrastructure deployment

Name Description Try it out
Outer Loop Default Azure Machine Learning outer infrastructure setup [DEPLOY BUTTON]
[TBD] Default Responsible AI for Classical Machine Learning [DEPLOY BUTTON]
Feature Store FEAST Default Feature Store using FEAST [DEPLOY BUTTON]

📯 MLOps use case deployment

Name AI Workload Type Services Try it out
classical-ml Classical machine learning Azure Machine Learning [DEPLOY BUTTON]
[TBD] Computer Vision Azure Machine Learning [DEPLOY BUTTON]
[TBD] Natural Language Processing Azure Machine Learning [DEPLOY BUTTON]
[TBD] Classical machine learning Azure Machine Learning, Azure Databricks [DEPLOY BUTTON]
[TBD] Computer Vision Azure Machine Learning, Azure Databricks [DEPLOY BUTTON]
[TBD] Natural Language Processing Azure Machine Learning, Azure Databricks [DEPLOY BUTTON]
[TBD] Edge AI Azure Machine Learning [DEPLOY BUTTON]

Contributing

This project welcomes contributions and suggestions. To learn more visit the contributing section, see CONTRIBUTING.md for details.

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments
  • Add azure devops repo init

    Add azure devops repo init

    This PR add a .azuredevops directory that accelerates an MLOps project template implementation in Azure DevOps. More particularly, it:

    • Allows users to run a pipeline that creates a project structure according to their selection (equivalent to the 'sparse checkout') in an empty repo.
    • Creates new Azure DevOps pipelines for infrastructure and MLOps based on the Azure Pipelines yaml files in infrastructure/pipelines and mlops/devops-pipelines respectively.
    💎 enhancement 
    opened by drosevear 9
  • Job naming issues with AML CLI training pipeline

    Job naming issues with AML CLI training pipeline

    Why?

    Currently when we deploy training pipeline using AML CLI we run into the problem that job names could not be validated.

    image

    How?

    As we can see in the screenshot above that hyphens ain't allowed in the naming conventions for the jobs. So we should consider repacing hyphens with underscores in pipeline.yml file.

    I could also create a PR if you consider this as a valid way of solving this issue.

    Anything else?

    opened by luhgit 9
  • Quickstart.md - Job: Create Bicep Deployment issue with the storage account name

    Quickstart.md - Job: Create Bicep Deployment issue with the storage account name

    Why?

    While running pipeline with the Create Bicep Deployment Job we will get the error: {'code': 'StorageAccountAlreadyTaken', 'target': 'stmlopsv2819prod', 'message': 'The storage account named stmlopsv2819prod is already taken.'}

    I believe storage account names are unique and since this was already executed I can't create a storage account with the same name.

    How?

    I believe in the both files config-infra-dev.yml and config-infra-prod.yml this line needs to change: st$(namespace)$(postfix)$(environment)

    and we might have to add some initials of our company, subscription or even from our name

    Thank you, Carla

    🏗️ infra 
    opened by carla-fiadeiro 8
  • Authorisation problem when deploying training pipeline

    Authorisation problem when deploying training pipeline

    Hello,

    I have been following your quick start guide, and have got to the stage where I need to deploy the pipeline "deploy-model-training-pipeline.yml" on Azure DevOps.

    When I run this, it goes as far as the Run pipeline in AML step in DevOps, then I get this error:

    If there is an Authorization error, check your Azure KeyVault secret named kvmonitoringspkey. Terraform might put single quotation marks around the secret. Remove the single quotes and the secret should work.
    .create table mlmonitoring (['Sno']: int, ['Age']: int, ['Sex']: string, ['Job']: int, ['Housing']: string, ['Saving accounts']: string, ['Checking account']: string, ['Credit amount']: int, ['Duration']: int, ['Purpose']: string, ['Risk']: string, ['timestamp']: datetime)
    Cleaning up all outstanding Run operations, waiting 300.0 seconds
    2 items cleaning up...
    Cleanup took 0.08368873596191406 seconds
    Traceback (most recent call last):
      File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 68, in acquire_authorization_header
        return _get_header_from_dict(self.token_provider.get_token())
      File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 123, in get_token
        token = self._get_token_impl()
      File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 554, in _get_token_impl
        return self._valid_token_or_throw(token)
      File "/azureml-envs/XXXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 201, in _valid_token_or_throw
        raise KustoClientError(message)
    azure.kusto.data.exceptions.KustoClientError: ApplicationKeyTokenProvider - failed to obtain a token. 
    invalid_client
    AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXX'.
    Trace ID: XXXXXX
    Correlation ID: XXXXXXX
    Timestamp: 2022-11-01 15:46:31Z
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "prep.py", line 83, in <module>
        main()
      File "prep.py", line 80, in main
        log_training_data(df, args.table_name)
      File "prep.py", line 35, in log_training_data
        collector.batch_collect(df)
      File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 158, in batch_collect
        self.create_table_and_mapping()
      File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 132, in create_table_and_mapping
        self.kusto_client.execute_mgmt(self.database_name, CREATE_TABLE_COMMAND)
      File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 891, in execute_mgmt
        return self._execute(self._mgmt_endpoint, database, query, None, self._mgmt_default_timeout, properties)
      File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 959, in _execute
        request_headers["Authorization"] = self._aad_helper.acquire_authorization_header()
      File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 72, in acquire_authorization_header
        raise KustoAuthenticationError(self.token_provider.name(), error, **kwargs)
    azure.kusto.data.exceptions.KustoAuthenticationError: KustoAuthenticationError('ApplicationKeyTokenProvider', 'KustoClientError("ApplicationKeyTokenProvider - failed to obtain a token. \ninvalid_client\nAADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXXXX'.\r\nTrace ID: XXXXXXXX\r\nTimestamp: 2022-11-01 15:46:31Z")', '{'authority': 'XXXXXX', 'client_id': 'XXXXX', 'kusto_uri': 'https://adxmlopsv286309prod.uksouth.kusto.windows.net'}')
    

    I have done some investigating:

    • It appears my Prod Service Principal is set up correctly. When I go to Project Settings > Service Connections > Azure-ARM-Prod > Edit, there is an option to verify the connection. It works here.
    • When I check my App Registrations and investigate the "Certificates and Secrets tab" of "Azure-ARM-Prod-mlops-sparse" there is indeed a secret in here. (The terraform pipeline to create the Prod infrastructure works, so I am lead to believe that my SPs are working correctly.)
    • When I go to my Prod Key Vault, there is a secret called kvmonitoringspkey - looking at this though the Secret Value is just $(CLIENT_SECRET) - is it meant to be this? If so, why? And where was is set to this?

    Do you have any advice on how I can fix this error?

    Bug 📑 documentation 
    opened by andrewblance 6
  • Enterprise readiness where Github is not the source control repository

    Enterprise readiness where Github is not the source control repository

    The current Quickstart doesn't cater for organisations that don't use GitHub as their source control repository. There's a need for these organisations to get started on Azure quickly – for MLOps projects to be initialised in Azure DevOps itself based on 'MLOps version' and 'project type'.

    💎 enhancement 
    opened by drosevear 4
  • How to achieve Continuous Integration

    How to achieve Continuous Integration

    Hi,

    As per the process, we are training the and registering the model in DEV Workspace. But how to bring/ deploy that model in Stage.

    It would be better if we have the steps between registering the model in dev workspace and deploy in Stage Workspace

    opened by MurugeswariMuthurajan 4
  • [repo] <Fix versions of all libraries >

    [repo]

    Why?

    If you don't fix the version, if there's an upgrade from the library the pipeline may break. Currently the install CLI action does not fix the version which means that Devops Agent will pick up the latest Azure ML CLI version while the job/pipeline definition still follow old version. This may break the pipeline.

    How?

    Anything else?

    opened by james-tn 4
  • Could not get the latest source version for repository

    Could not get the latest source version for repository

    @setuc /mlops/devops-pipelines/deploy-model-training-pipeline.yml: Could not get the latest source version for repository Azure/mlops-templates hosted on https://github.com/ using ref refs/heads/main. GitHub reported the error, "Bad credentials"

    devops-error

    Thank you ..

    ✅ resolved 
    opened by yogidosalwar 3
  • [mlops-v2] Is the Quickstart tutorial limited to MS/Azure employees ?

    [mlops-v2] Is the Quickstart tutorial limited to MS/Azure employees ?

    Why?

    This repo (mlops-v2) was recommended to us (MS/Azure Client) by our MS business contact to try out Azure MLOps. However, it looks like some steps require specific organization access.

    How?

    • Doing the Quickstart
    • Creating the PAT in the Developer Settings of my GH account works fine
    • However, step 2.7 (pictured bellow), isn't available to us for pretty obvious reasons (not part of Azure) : image

    is there a way to go around this ? A public version of the tutorial/setup ?

    looking forward to try it out ! Thanks in advance

    Bug 📑 documentation 
    opened by gaspard-met 3
  • Hosted Agent ran longer than the maximum time of 60 minutes

    Hosted Agent ran longer than the maximum time of 60 minutes

    @setuc @cindyweng while running 3rd pipeline(Inner / Outer Loop: Moving to Production - Azure DevOps) Showing error The job running on agent Hosted Agent ran longer than the maximum time of 60 minutes while Test deployment. What should i do ? any suggestions ??

    Thank you.

    ✅ resolved 
    opened by lokierao 2
  • [Quickstart.md] Deploying Infrastructure via Azure DevOps

    [Quickstart.md] Deploying Infrastructure via Azure DevOps

    Hi, I am facing an issue with the deployment via Azure Pipeline. When I run the pipeline, the following error is thrown.

    Screenshot_20221130_192350

    It seemed like I had to change the resource repositories and connections. So, I tried changing it to the following and another error is thrown.

    Screenshot_20221130_192212

    Can anyone help me with the issue?

    Thanks, Saiham

    ✅ resolved 
    opened by saiham6 2
  • Main dec31

    Main dec31

    PR Summary into Azure/mlops-project-template

    Checklist

    I have:

    • [x] read and followed the contributing guidelines
    • [x] PR has a meaningful title
    • [x] Summarized the changes
    • [x] PR is ready to merge and NOT ** WORK IN PROGRESS **

    Changes

    fixes #

    opened by setuc 0
  • When working with Conda, there are no checks on the environment packages

    When working with Conda, there are no checks on the environment packages

    Describe the bug or the issue that you are facing

    When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

    Steps/Code to Reproduce

    When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

    az ml environment create --file .\train-conda.yml

    train-conda.yml

    $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
    name: docker-image-plus-conda-example
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
    conda_file: environment.yml
    description: Environment created from a Docker image plus Conda environment.
    

    environment.yml

    channels:
      - defaults
      - anaconda
      - conda-forge
    dependencies:
      - python=3.7.5
      - pip
      - pip:
          - azureml-mlflow==1.38.0
          - azureml-sdk==1.38.0
          - sklearn==0.24.1
    

    Expected Output

    No error is thrown even though there is no such package like sklearn.

    There should be a check to make sure that the environment has been successfully created before moving on to the next steps.

    Versions

    running az -v azure-cli 2.43.0

    core 2.43.0 telemetry 1.0.8

    Extensions: account 0.2.5 ml 2.12.1

    Dependencies: msal 1.20.0 azure-mgmt-resource 21.1.0b1

    Which platform are you using for deploying your infrastrucutre?

    Azure DevOps (ADO)

    If you mentioned Others, please mention which platformm are you using?

    No response

    What are you using for deploying your infrastrucutre?

    Bicep

    Are you using Azure ML CLI v2 or Azure ML Python SDK v2

    Azure ML CLI v2

    Describe the example that you are trying to run?

    Tabular. but this affects all the environments.

    Bug Needs Triage 
    opened by setuc 1
  • error while running 3rd pipeline Inner / Outer Loop: Moving to Production - Azure DevOps

    error while running 3rd pipeline Inner / Outer Loop: Moving to Production - Azure DevOps

    Hi @setuc @cindyweng, getting error while running deploy-batch-endpoint-pipeline.yml . while creating deployment

    ERROR: (UserError) The specified resource was not found. Code: UserError Message: The specified resource was not found. Exception Details: (ModelNotFound) Model container with name: patient-model not found. Code: ModelNotFound Message: Model container with name: xyz-model not found.

    Thank you

    ❓question 
    opened by yogidosalwar 3
  • "DeployTrainingPipeline" throws error at the "Connect to AML Workspace" step

    To make the infrastructure, I ran an Azure Pipeline using infrastructure/pipelines/tf-ado-deploy-infra.yml, and it worked. Then, used mlops/devops-pipelines/deploy-model-training-pipeline.yml to run the (training) Pipeline on the sample data provided, but it stops at the "Connect to AML Workspace" step with the following error:

    Screenshot 2022-11-30 at 14 37 54

    Then, checked the workspace that's created automatically by the Pipeline and noticed that I don't have access to the Jobs, while receiving a weird notification:

    Screenshot 2022-11-30 at 14 27 01

    Screenshot 2022-11-30 at 14 31 42

    That is while I have a "Contributor" access to the subscription and also, all those secrets and App registrations, etc are made/done without any problem, based on the QuickStart tutorial.

    opened by smortezah 1
  • Getting error while creating Responsible Ai dashboard

    Getting error while creating Responsible Ai dashboard

    I am trying to use different algorithms for training. Model training and model registration part is successfully completed, but RAI insights dashboard constructor is failed. I am using the xgboost model for training.

    Traceback (most recent call last):
      File "create_rai_insights.py", line 184, in <module>
        main(args)
      File "create_rai_insights.py", line 128, in main
        model_estimator = load_mlflow_model(my_run.experiment.workspace, model_id=model_id)
      File "/mnt/azureml/cr/j/exe/wd/rai_component_utilities.py", line 79, in load_mlflow_model
        return mlflow.pyfunc.load_model(model_uri)._model_impl
      File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 484, in load_model
        model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
      File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 494, in _load_pyfunc
        return _load_model_from_local_file(path=path, serialization_format=serialization_format)
      File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 452, in _load_model_from_local_file
        return cloudpickle.load(f)
    ModuleNotFoundError: No module named 'xgboost'
    

    Thank You

    opened by yogidosalwar 11
  • how can i create inference clusters ?

    how can i create inference clusters ?

    Hi @setuc , I want to create inference cluster, i.e. Deploy model to the Azure Kubernetes Service for large scale inferencing. How can I create a compute inference cluster?

    Thank you

    opened by yogidosalwar 1
Releases(v1.0.0)
  • v1.0.0(Sep 2, 2022)

    The repository now contains the following patterns that have been implemented:

    1. Classical / Tabular Machine Learning Model a. Supports Azure DevOps and GitHub Actions. Note that GitHub Actions are only working for Azure ML CLI v2 (aml-cli-v2). b. Supports Azure Data Explorer based Monitoring, Data Drift and Anomaly Detection. It is enabled for Terraform and can be invoked via python sdk v1 or Azure ML CLI v2 (aml-cli-v2). c. Can deploy both online as well as batch end points.

    2. Computer Vision (CV) Model a. Supports Azure DevOps and GitHub Actions. Note that GitHub Actions are only working for Azure ML CLI v2 (aml-cli-v2). b. Currently there is no monitoring support for this. c. Can deploy both online as well as batch end points.

    3. Natural Language Processing (NLP) Model a. Supports only support Azure DevOps via Azure ML CLI v2 (aml-cli-v2). b. Currently there is no monitoring support for this. c. Can deploy both online as well as batch end points.

    Other Patterns included in the release

    • Templates for using individual / repeatable steps in your template. These templates are available for GitHUb Actions and Azure DevOps (CLI and SDK). The Template repo can be found here: https://github.com/Azure/mlops-templates
    • Registration of Multiple Datasets
    • Using 3rd party / external containers. Support for dependabot python package scans via pip install in docker container.

    Full Changelog: https://github.com/Azure/mlops-project-template/commits/v1.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
Microsoft Azure
APIs, SDKs and open source projects from Microsoft Azure
Microsoft Azure
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
A Python implementation of FastDTW

fastdtw Python implementation of FastDTW [1], which is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal align

tanitter 651 Jan 04, 2023
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
Avocado hass time series vs predict price

AVOCADO HASS TIME SERIES VÀ PREDICT PRICE Trước khi vào Heroku muốn giao diện đẹp mọi người chuyển giúp mình theo hình bên dưới https://avocado-hass.h

hieulmsc 3 Dec 18, 2021
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

Krishna Priyatham Potluri 73 Dec 01, 2022
Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Sean Zahller 1 Feb 04, 2022
Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

Na'aman Hirschfeld 396 Dec 28, 2022
Machine Learning from Scratch

Machine Learning from Scratch Author: Shengxuan Wang From: Oregon State University Content: Building Machine Learning model from Scratch, without usin

ShawnWang 0 Jul 05, 2022
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 03, 2023
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base l

Booking.com 254 Dec 31, 2022
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

Sontag Lab 75 Jan 03, 2023
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 2.4k Jan 02, 2023
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
The code from the Machine Learning Bookcamp book and a free course based on the book

The code from the Machine Learning Bookcamp book and a free course based on the book

Alexey Grigorev 5.5k Jan 09, 2023
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 07, 2023