MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

Overview

The collaboration platform for Machine Learning

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.


MLReef

MLReef is a ML/DL development platform containing four main sections:

  • Data-Management - Fully versioned data hosting and processing infrastructure
  • Publishing code repositories - Containerized and versioned script repositories for immutable use in data pipelines
  • Experiment Manager - Experiment tracking, environments and results
  • ML-Ops - Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)


To find out more about how MLReef can streamline your Machine Learning Development Lifecycle visit our homepage

Data Management

  • Host your data using git / git LFS repositories.
    • Work concurrently on data
    • Fully versioned or LFS version control
    • Full view on data processing and visualization history
  • Connect your external storage to MLReef and use your data directly in pipelines
  • Data set management (access, history, pipelines)

Publishing Code

Adding only parameter annotations to your code...

# example of parameter annotation for a image crop function
 @data_processor(
        name="Resnet50",
        author="MLReef",
        command="resnet50",
        type="ALGORITHM",
        description="CNN Model resnet50",
        visibility="PUBLIC",
        input_type="IMAGE",
        output_type="MODEL"
    )
    @parameter(name='input-path', type='str', required=True, defaultValue='train', description="input path")
    @parameter(name='output-path', type='str', required=True, defaultValue='output', description="output path")
    @parameter(name='height', type='int', required=True, defaultValue=224, description="height of cropped images in px")
    @parameter(name='width', type='int', required=True, defaultValue=224, description="width of cropped images in px")
    def init_params():
        pass

...and publishing your scripts gets you the following:

  • Containerization of your scripts
    • Always working scripts including easy hyperparameter access in pipelines
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
      • ArgParser for command line parameters with currently used values
      • Explicit parameters dictionary
      • Input validation and guides
  • Multiple containers based on version and code branches

Experiment Manager

  • Complete experiment setup log
    • Full source control info including non-committed local changes
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
  • Full experiment output automatic capture
    • Artifacts storage and standard-output logs
    • Performance metrics on individual experiments and comparative graphs for all experiments
    • Detailed view on logs and outputs generated
  • Extensive platform support and integrations

ML-Ops

  • Concurrent computing pipelining
  • Governance and control
    • Access and user management
    • Single permission management
    • Resource management
  • Model management

MLReef Architecture

The MLReef ML components within the ML life cycle:

  • Data Storage components based currently on Git and Git LFS.
  • Model development based on working modules (published by the community or your team), data management, data processing / data visualization / experiment pipeline on hosted or on-prem and model management.
  • ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Why MLReef?

MLReef is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a tangled process. MLReef tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed MLReef on best data science practices combined with the knowleged gained from DevOps and a deep focus on collaboration.

  • Use it on a daily basis to boost collaboration and visibility in your team
  • Create a job in the cloud from any code repository with a click of a button
  • Automate processes and create pipelines to collect your experimentation logs, outputs, and data
  • Make you ML life cycle transparent by cataloging it all on the MLReef platform

Getting Started as a Developer

To start developing, continue with the developer guide

Canonical source

The canonical source of MLReef where all development takes place is hosted on gitLab.com/mlreef/mlreef.

License

MIT License (see the License for more information)

Documentation, Community and Support

More information in the official documentation and on Youtube.

For examples and use cases, check these use cases or start the tutorial after registring:

If you have any questions: post on our Slack channel, or tag your questions on stackoverflow with 'mlreef' tag.

For feature requests or bug reports, please use GitLab issues.

Additionally, you can always reach out to us via [email protected]

Contributing

Merge Requests are always welcomed ❤️ See more details in the MLReef Contribution Guidelines.

Owner
MLReef
Your entire Machine Learning life cycle in one platform.
MLReef
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Mert Sezer Ardal 1 Jan 31, 2022
A machine learning web application for binary classification using streamlit

Machine Learning web App This is a machine learning web application for binary classification using streamlit options this application contains 3 clas

abdelhak mokri 1 Dec 20, 2021
The Simpsons and Machine Learning: What makes an Episode Great?

The Simpsons and Machine Learning: What makes an Episode Great? Check out my Medium article on this! PROBLEM: The Simpsons has had a decline in qualit

1 Nov 02, 2021
whylogs: A Data and Machine Learning Logging Standard

whylogs: A Data and Machine Learning Logging Standard whylogs is an open source standard for data and ML logging whylogs logging agent is the easiest

WhyLabs 2k Jan 06, 2023
Software Engineer Salary Prediction

Based on 2021 stack overflow data, this machine learning web application helps one predict the salary based on years of experience, level of education and the country they work in.

Jhanvi Mimani 1 Jan 08, 2022
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 2.4k Jan 02, 2023
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

wenqi 2 Jun 26, 2022
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics

Facebook Research 4.1k Dec 29, 2022
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.4k Jan 15, 2022
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported ha

Microsoft 1.1k Jan 04, 2023
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
Reproducibility and Replicability of Web Measurement Studies

Reproducibility and Replicability of Web Measurement Studies This repository holds additional material to the paper "Reproducibility and Replicability

6 Dec 31, 2022
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
The Fuzzy Labs guide to the universe of open source MLOps

Open Source MLOps This is the Fuzzy Labs guide to the universe of free and open source MLOps tools. Contents What is MLOps, anyway? Data version contr

Fuzzy Labs 352 Dec 29, 2022
Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

Rupert Tombs 2 Jul 19, 2022
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
This machine learning model was developed for House Prices

This machine learning model was developed for House Prices - Advanced Regression Techniques competition in Kaggle by using several machine learning models such as Random Forest, XGBoost and LightGBM.

serhat_derya 1 Mar 02, 2022