Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Overview

Rubicon

Conda Version PyPi Version Test Package Publish Package Publish Docs

Purpose

Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Rubicon's git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the Rubicon dashboard makes it easy to explore, filter, visualize, and share recorded work.


Components

Rubicon is composed of three parts:

  • A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
  • A dashboard for exploring, comparing, and visualizing logged data built with dash
  • And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use the Rubicon library to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Here's a simple example:

from rubicon import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

rubicon is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

or

pip install rubicon-ml

Develop

rubicon uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f ci/environment.yml
conda activate rubicon-dev

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during cicd). These tests include:

  • Integration tests that connect to physical filesystems (local, S3). You'll want to configure the root_dir appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:

    pytest -m "physical_filesystem_test"
    
  • Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the Install instructions in the Dash Testing Docs or install via brew with brew cask install chromedriver. You may have to update your permissions in Security & Privacy to install with brew.

    pytest -m "dashboard_test"
    

    Note: The --headless flag can be added to run the dashboard tests in headless mode.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

Contributors


Mike McCarty


Sri Ranganathan


Joe Wolfe


Ryan Soley


Diane Lee

Comments
  • Edgetest action

    Edgetest action

    What

    • This PR adds in the edgetest action to ensure the basic requirements are up to date given the tests pass.
    • I had to refactor the setup a bit to be PEP517 compliant.

    How to Test

    • The install locally works for me but maybe a second set of eyes on this would be really helpful @ryanSoley @shania-m
    opened by fdosani 15
  • auto versioning using git tags

    auto versioning using git tags

    What

    • in version.py and rubicon/_version.py, use git to get the version from the latest tag
      • this needs to be run from the repo with tags cloned too, which means it'll only work for devs who've installed from source (or the CI with the fetch_depth maxed out for now, in our case)
    • to deal with that, setup.py calls version.py's _write_version_file function when the package is bundled
      • this replaces rubicon/_version.py with a function that returns a hardcoded string (fetched from the git tags in the build environment)
      • this change won't ever need to be committed to the repo, because the current git solution will always work for anyone installed from source

    How to Test

    • from the repo's root, run pip install -e ., launch a python interpreter, and import rubicon
      • rubicon.__version__ should return the latest version
      • navigate out of the repo and try the same thing - get_version will fail since theres no git repo
    • build rubicon - python setup.py sdist bdist_wheel - and install the wheel file
      • now rubicon.__version__ should return the latest version from a python interpreter started anywhere
    enhancement 
    opened by ryanSoley 9
  • "Rubicon" name collides with existing project in the Python ecosystem

    Describe the bug

    I was just made aware of this project via a PyCon US announcement email.

    The problem: the name you've chosen for this project collides with an existing project in the Python ecosystem.

    I've been using the name Rubicon in the Python ecosystem since 2014. I'm the owner of the Rubicon record in PyPI, as well as some related projects:

    • https://pypi.org/project/rubicon/
    • https://pypi.org/project/rubicon-java/
    • https://pypi.org/project/rubicon-objc/

    These projects are in active use in the Python community, and the Java subproject received funding (indirectly) from the PSF through their support of the BeeWare Android port.

    I can only assume this is something you were at least partially aware of, because you've chosen the name rubicon-ml for your PyPI package, and changed the name of the package in setup.py.

    Although the projects are in a different domain (language bridging vs numerical processing), I'd argue there is potential for confusion since they're both active projects in the same language ecosystem, and there is some usage of BeeWare tooling in the numerical processing community.

    I humbly request you choose a different name for your project that doesn't collide with my pre-existing usage.

    bug needs triage 
    opened by freakboy3742 7
  • Use conda incubator action for environment setup

    Use conda incubator action for environment setup

    Unpin Python in environment file to make sure Python version is not always 3.8

    closes: #42


    What

    • Uses the conda-incubator action for more flexible miniconda setup
    • Unset strict python version in environment file so the version matrix checks all the versions
    • Add percy to conda instead of using pip

    How to Test

    • I think if the CI passes it works?
    opened by gforsyth 5
  • python-3.10.6-h582c2e5_0_cpython.tar.bz2: 3 vulnerabilities (highest severity is: 9.8)

    python-3.10.6-h582c2e5_0_cpython.tar.bz2: 3 vulnerabilities (highest severity is: 9.8)

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Vulnerabilities

    | CVE | Severity | CVSS | Dependency | Type | Fixed in | Remediation Available | | ------------- | ------------- | ----- | ----- | ----- | --- | --- | | CVE-2015-20107 | High | 9.8 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2020-10735 | High | 7.5 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2021-28861 | High | 7.4 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | v3.11 | ❌ |

    Details

    CVE-2015-20107

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    In Python (aka CPython) through 3.10.4, the mailcap module does not add escape characters into commands discovered in the system mailcap file. This may allow attackers to inject shell commands into applications that call mailcap.findmatch with untrusted input (if they lack validation of user-provided filenames or arguments).

    Publish Date: 2022-04-13

    URL: CVE-2015-20107

    CVSS 3 Score Details (9.8)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: None
      • Scope: Unchanged
    • Impact Metrics:
      • Confidentiality Impact: High
      • Integrity Impact: High
      • Availability Impact: High

    For more information on CVSS3 Scores, click here.

    CVE-2020-10735

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    A flaw was found in python. In algorithms with quadratic time complexity using non-binary bases, when using int("text"), a system could take 50ms to parse an int string with 100,000 digits and 5s for 1,000,000 digits (float, decimal, int.from_bytes(), and int() for binary bases 2, 4, 8, 16, and 32 are not affected). The highest threat from this vulnerability is to system availability.

    Publish Date: 2022-09-09

    URL: CVE-2020-10735

    CVSS 3 Score Details (7.5)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: None
      • Scope: Unchanged
    • Impact Metrics:
      • Confidentiality Impact: None
      • Integrity Impact: None
      • Availability Impact: High

    For more information on CVSS3 Scores, click here.

    CVE-2021-28861

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    ** DISPUTED ** Python 3.x through 3.10 has an open redirection vulnerability in lib/http/server.py due to no protection against multiple (/) at the beginning of URI path which may leads to information disclosure. NOTE: this is disputed by a third party because the http.server.html documentation page states "Warning: http.server is not recommended for production. It only implements basic security checks."

    Publish Date: 2022-08-23

    URL: CVE-2021-28861

    CVSS 3 Score Details (7.4)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: Required
      • Scope: Changed
    • Impact Metrics:
      • Confidentiality Impact: High
      • Integrity Impact: None
      • Availability Impact: None

    For more information on CVSS3 Scores, click here.

    Suggested Fix

    Type: Upgrade version

    Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28861

    Release Date: 2022-08-23

    Fix Resolution: v3.11

    security vulnerability 
    opened by mend-for-github-com[bot] 4
  • Issue with the edgetest action and Dask

    Issue with the edgetest action and Dask

    Describe the bug @ryanSoley @ak-gupta I was digging into the edgetest action a bit more and I was able to recreate the bug we were seeing.

    Steps/Code to reproduce bug Running the following:

    conda create -n test python=3.9 pip conda
    conda activate test
    pip install dask==2022.2.0 prefect
    

    If I do a pip list I end up with:

    dask                    2022.2.0
    prefect                 1.1.0
    

    Then if I upgrade the following:

    pip install dask prefect --upgrade
    
    >
    Requirement already satisfied: dask in ~/miniconda3/envs/test/lib/python3.9/site-packages (2022.2.0)
    Collecting dask
      Using cached dask-2022.3.0-py3-none-any.whl (1.1 MB)
    Requirement already satisfied: prefect in ~/miniconda3/envs/test/lib/python3.9/site-packages (1.1.0)
    Requirement already satisfied: partd>=0.3.10 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (1.2.0)
    Requirement already satisfied: fsspec>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2022.2.0)
    Requirement already satisfied: cloudpickle>=1.1.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2.0.0)
    Requirement already satisfied: packaging>=20.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (21.3)
    Requirement already satisfied: toolz>=0.8.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (0.11.2)
    Requirement already satisfied: pyyaml>=5.3.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (6.0)
    Requirement already satisfied: requests>=2.25 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.27.1)
    Requirement already satisfied: python-box>=5.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.0.1)
    Requirement already satisfied: pendulum>=2.0.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.1.2)
    Requirement already satisfied: marshmallow>=3.0.0b19 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.15.0)
    Requirement already satisfied: toml>=0.9.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.10.2)
    Requirement already satisfied: docker>=3.4.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.0.3)
    Requirement already satisfied: distributed>=2.17.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.2.0)
    Requirement already satisfied: python-slugify>=1.2.6 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.1.1)
    Requirement already satisfied: importlib-resources>=3.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.4.0)
    Requirement already satisfied: croniter>=0.3.24 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.3.4)
    Requirement already satisfied: urllib3>=1.26.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.26.9)
    Requirement already satisfied: mypy-extensions>=0.4.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.4.3)
    Requirement already satisfied: pytz>=2018.7 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.1)
    Requirement already satisfied: msgpack>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.0.3)
    Requirement already satisfied: tabulate>=0.8.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.8.9)
    Requirement already satisfied: python-dateutil>=2.7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.8.2)
    Requirement already satisfied: click>=7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (8.0.4)
    Requirement already satisfied: marshmallow-oneofschema>=2.0.0b2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.0.1)
    Requirement already satisfied: psutil>=5.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (5.9.0)
    Requirement already satisfied: tblib>=1.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (1.7.0)
    Requirement already satisfied: jinja2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (3.0.3)
    Requirement already satisfied: setuptools in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (60.10.0)
    Requirement already satisfied: zict>=0.1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.1.0)
    Requirement already satisfied: tornado>=6.0.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (6.1)
    Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.4.0)
    Requirement already satisfied: websocket-client>=0.32.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from docker>=3.4.1->prefect) (1.3.1)
    Requirement already satisfied: zipp>=3.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from importlib-resources>=3.0.0->prefect) (3.7.0)
    Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from packaging>=20.0->dask) (3.0.7)
    Requirement already satisfied: locket in ~/miniconda3/envs/test/lib/python3.9/site-packages (from partd>=0.3.10->dask) (0.2.1)
    Requirement already satisfied: pytzdata>=2020.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from pendulum>=2.0.4->prefect) (2020.1)
    Requirement already satisfied: six>=1.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-dateutil>=2.7.0->prefect) (1.16.0)
    Requirement already satisfied: text-unidecode>=1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-slugify>=1.2.6->prefect) (1.3)
    Requirement already satisfied: certifi>=2017.4.17 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2021.10.8)
    Requirement already satisfied: charset-normalizer~=2.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2.0.12)
    Requirement already satisfied: idna<4,>=2.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (3.3)
    Requirement already satisfied: heapdict in ~/miniconda3/envs/test/lib/python3.9/site-packages (from zict>=0.1.3->distributed>=2.17.0->prefect) (1.0.1)
    Requirement already satisfied: MarkupSafe>=2.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from jinja2->distributed>=2.17.0->prefect) (2.1.1)
    

    Expected behavior dask should be upgraded to 2022.3.0, but due to some interaction with prefect it doesn't seem to. I've tested with other packages and it seems like dask is the only one which causes this.

    Additional context Will need to dig in a bit more but I'm thinking this isn't a edgetest issue but something to do with prefect? Appreciate any thoughts or insights either of you might have.

    bug needs triage 
    opened by fdosani 4
  • added buttons to select all or remove all columns in UI table

    added buttons to select all or remove all columns in UI table

    closes: #61


    What

    • Introduced buttons to select all columns or hide all columns in Rubicon UI experiment tables

    How to Test

    • When select all columns button is clicked, all columns appear in the table
    • When clear all columns button is clicked, no columns appear in the table
    opened by shania-m 4
  • Logging MultiIndex Dataframes Fails

    Logging MultiIndex Dataframes Fails

    Describe the bug It appears that internally, rubicon's .log_dataframe() converts pandas dataframes to dask dataframes regardless of the situation. This can cause issues in scenarios where dask might not support certain dataframe features such as multiindex dataframes.

    Steps/Code to reproduce bug

    import pandas as pd
    from rubicon.client import Rubicon
    # Create sample data
    df = pd.DataFrame([[0,1,'a'],[1,1,'b'],[2,2,'c'],[3,2,'d']], columns=['a', 'b', 'c'])
    df = df.set_index(['b', 'a']) # Set multiindex
    df
         c
    b a   
    1 0  a
      1  b
    2 2  c
      3  d
    
    # Log dataframe to rubicon
    rubicon = Rubicon(persistence="memory")
    project = rubicon.get_or_create_project("test")
    exp = project.log_experiment('test_exp')
    exp.log_dataframe(df)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/client/mixin.py", line 251, in log_dataframe
        self.repository.create_dataframe(dataframe, df, project_name, experiment_id=experiment_id)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 426, in create_dataframe
        data = self._convert_to_dask_dataframe(data)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 396, in _convert_to_dask_dataframe
        return dd.from_pandas(df, npartitions=1)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/dask/dataframe/io/io.py", line 202, in from_pandas
        raise NotImplementedError("Dask does not support MultiIndex Dataframes.")
    NotImplementedError: Dask does not support MultiIndex Dataframes.
    

    Additional context Not familiar as to why pandas dataframes need to be converted to dask dataframes every time during logging but the solution would revolve around avoiding conversion to dask since dask in this case does not support multiindex.

    bug 
    opened by Lazea 4
  • Automatic sklearn pipeline logging

    Automatic sklearn pipeline logging

    Is your feature request related to a problem? Please describe

    One way to log training data to Rubicon would be to extend the scikit-learn.pipeline so information could be logged before and/or after each step. We could extend the class and override the fit and predict methods to add optional hooks before and after.

    Describe the solution you'd like

    Something like...

    from sklearn.pipeline import Pipeline
    
    class RubiconPipeline(Pipeline):
    
    def before_fit(X, y=None, **fit_params):
        # logs info from self.steps
        ...
    
    def after_fit(X, y=None, **fit_params):
        # logs info from self.steps after fitting
        ...
    
    def fit(self, X, y=None, **fit_params):
        self.before_fit(X, y)
        retval = super().fit(X, y=y, **fit_params)
        self.after_fit(X, y)
        return retval
    

    Additional context

    Three cases to consider:

    1. Inferred logging from inspecting X's, y's and estimator object
    2. Logging through an extended common Rubicon/SKLearn API (optionally call .rubicon_log methods on estimators)
    3. Logging through user defined functions (UDFs) optionally provided to RubiconPipeline.__init__
    development feature 
    opened by joe-wolfe21 4
  • reorganize the existing docs

    reorganize the existing docs

    Is your documentation request related to a problem? Please describe

    we would like to update the rubicon-ml docs to follow the Diataxis framework

    Describe the solution you'd like

    once #207 is completed and we have a plan for reorganizing the docs, this issue will track the actual reorganization work

    documentation 
    opened by ryanSoley 3
  • `jupyter-dash` proof-of-concept

    `jupyter-dash` proof-of-concept

    I've been thinking about how we could get a live example of the dashboard hosted for users for a while now. I saw how the dask examples use JupyterLab through binder to show off their task graphs and stuff, so I thought it'd be great if we had a way to run the UI in JupyterLab. Of course we can launch it from lab, but that runs it on a localhost port which may not always be accessible.

    Then I came across this blog and thought if we could just use this it'd solve it.

    https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

    I was gonna raise this as an issue but it ended up being super easy to implement, so here it is.

    What

    • uses jupyter-dash to instantiate the dashboard app
      • the default option, "external", runs it exactly the same as dash.Dash would
      • now there are options for "jupyterlab" and "inline" which display the dashboard in a new JLab window or inline in a notebook respectively

    How to Test

    • run through the added notebook (should we even keep it?)
    opened by ryanSoley 3
  • return rubicon objects with proper parents

    return rubicon objects with proper parents

    Is your enhancement request related to a problem? Please describe

    the rubicon objects that can optionally be returned by RubiconJSON are currently utilizing a NoOpParent instead of the proper experiment/project parent

    https://github.com/capitalone/rubicon-ml/blob/jsonpath-poc/rubicon_ml/client/rubicon_json.py#L24

    this means that any operations that need to actually reach out to the filesystem will not be possible, essentially making the objects read-only

    Describe the solution you'd like

    we should try to associate the proper parents with each returned object so that they will be fully functional rubicon objects

    this will likely require the inspection of the match.value objects returned from each query. the JSON in match.value should have an experiment_id (feature, metric, parameter), or a parent_id (experiment, artifact, dataframe)

    projects are the exception - they require a config object rather than a parent. ‼️ this is actually incorrect in the current implementation and I didn't catch it before merging to the integration branch, so we'll have to address it too ‼️

    there are a few steps that'll be required in the process:

    • for each match in the results object:
      • extract the experiment_id or parent_id from the result
      • if the queried object is not a project:
        • if RubiconJSON is instantiated with experiments as an input and the parent of the queried object is an experiment, the parent should be in that list
        • if RubiconJSON is instantiated with projects as an input and the parent of the queried object is a project, the parent should be in that list
        • if RubiconJSON is instantiated with projects as an input and the parent of the queried object is an experiment, the parent should retrievable from one of those projects using project.experiment(id=...
        • if RubiconJSON is instantiated with top level rubicon objects as an input, we'll need to leverage get_project(id=... and experiment(id=... to retrieve the proper parents whether they be a project or experiment
      • if the queried object is a project:
        • we must be retrieving it from a top level rubicon input to RubiconJSON - so we can just take the config object off that top level rubicon object and dump it into the new project

    Describe alternatives you've considered

    if this ends up being infeasible, its fine to just leave the NoOpParents

    Additional context Add any other context, code examples, or references to existing implementations about the feature request here.

    enhancement development 
    opened by ryanSoley 0
  • add example & docs for `RubiconJSON`

    add example & docs for `RubiconJSON`

    Is your documentation request related to a problem? Please describe

    since this is a completely new feature, we'll need a new section in the docs for it

    Describe the solution you'd like

    design a notebook showing how the new RubiconJSON class works (maybe just adapt the poc one we're basing this all off) for the documentation. add said notebook to the docs. add any new, public python methods to the API reference

    documentation example 
    opened by ryanSoley 0
  • add python 3.11 to test matrix

    add python 3.11 to test matrix

    What

    • adds python 3.11 to test matrix in the testing workflow

    How to Test

    • make sure the CI on this branch runs tests for four python versions, 3.8-3.11
    enhancement 
    opened by ryanSoley 1
  • add example showing rubicon w/ DataProfiler

    add example showing rubicon w/ DataProfiler

    Is your feature request related to a problem? Please describe

    after seeing the DataProfiler demo at PyCon, we decided we could show an example of rubicon tracking data profiles over a project/experiments' lifetime

    Describe the solution you'd like

    • create an example that shows experiments profiling each incoming dataset and logging the profiles to rubicon
    • use rubicon to illustrate a change in data profiles over experiments
    • reference new data profiler integration example in "logging training metadata" example, as it is basically an extension of logging training metadata
    • title the new example "Integrate with DataProfiler" and add it to the "How to..." section of the docs
    documentation example 
    opened by ryanSoley 0
  • Validate fspec backends work with Rubicon

    Validate fspec backends work with Rubicon

    Rubicon-ml leverages fsspec for persistence. This issue includes:

    1. Determine which fsspec backends apply to rubicon
    2. Validate each backend works with rubicon-ml
    3. add working backends to docs
    documentation discussion 
    opened by shania-m 1
  • add test for the value error in `project`, `metric`, `experiment` and the other getters

    add test for the value error in `project`, `metric`, `experiment` and the other getters

    Describe the solution you'd like Create a test function so that the value error is correctly thrown in each getter for making sure name or id is sent as a parameter.

    Describe alternatives you've considered Alternatively, different functions to test this value error in each getter function could be created but it would most likely be more efficient to write one to be used for all.

    enhancement development 
    opened by andreafehrman 1
Releases(0.4.3)
  • 0.4.3(Dec 14, 2022)

    changelog

    • s3fs dependency now optional and installed via the s3 extra (#326)
    • renamed ui extra to viz to reflect module name change (#326)
    • dependency updates via edgetest
    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Dec 12, 2022)

    changelog

    • json encode/decode numpy objects (#321)
    • dependency updates via edgetest (#320)

    bugfixes

    • fix tag display in experiments table (#318)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Dec 2, 2022)

    changelog

    • added type checking for tags (#304)
    • update existing intake catalogs (#308)
    • log python objects as artifacts directly (#310)
    • dependency updates via edgetest (#305)

    bugfixes

    • update setup.cfg (#316)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Nov 16, 2022)

    changelog

    • address existing deprecations (#286)
    • deprecate async submodule (#287)
    • add new examples & example cleanup (#292, #293, #295)
    • add failure modes (#301)
    • dependency updates via edgetest (#283, #289, #291, #296, #297, #300)

    bugfixes

    • fix Binder examples (#284)
    • fix tag removal bug (#298)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.10(Sep 29, 2022)

    changelog

    • added tagging to features (#278)
    • added tagging to parameters (#280)
    • dependency updates via edgetest (#279, #281)

    bugfixes

    • fixes a bug where add_tags and remove_tags did not work properly on entities with names with underscores in them (#280)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.9(Sep 23, 2022)

    changelog

    • added tagging to metrics (#273, #276)
    • dependency updates via edgetest (#267, #274)

    bugfixes

    • artifacts can now be retrieved by tags (#275)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.8(Sep 16, 2022)

    changelog

    • tags can now be applied to artifacts (#268)
    • dependency updates via edgetest (#257, #259, #260, #264, #265)

    bugfixes

    • fixes duplicate source registration error from newest intake release (#262)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.7(Jul 6, 2022)

    changelog

    • get edgetest working (#225, #230)
    • dependency updates via edgetest (#231, #238, #239, #241, #242, #245, #248, #250)
    • documentation and example updates (#218, #222, #234, #235, #246, #249)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.6(Apr 12, 2022)

  • 0.3.5(Mar 29, 2022)

  • 0.3.4(Mar 17, 2022)

    changelog

    • Added edgetest action for up-to-date requirements (#195)

    bug fixes

    • Update intake dependency to include msgpack when using pip (#199)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Mar 17, 2022)

    changelog

    • Added make_pipeline to Rubicon_ml.sklearn.pipeline to create pipelines (#185)
    • RubiconPipeline constructor takes memory and verbose arguments as well without ***kwargs (#185)
    • Added multiple scores and fits to pipelines in Rubicon_ml.sklearn.pipeline (#186)
    • Support score_samples on pipelines in Rubicon_ml.sklearn.pipeline (#192)
    • Add pipeline slicing on pipelines in Rubicon_ml.sklearn.pipeline (#194)

    bugfixes

    • Support NoneType values in correlation plot (#189)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Feb 16, 2022)

  • 0.3.1(Jan 24, 2022)

  • 0.3.0(Jan 21, 2022)

    changelog

    • adds new viz module to visualize logged data (#149)
      • more info in our docs
      • deprecates ui module in favor of viz
    • removes old rubicon module that was deprecated in favor of rubicon_ml in #93 (#169)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.11(Nov 29, 2021)

    changelog

    • add ability to instantiate dashboard with Rubicon object (#119)
    • support Dash 2.0.0 (#121)
    • preserve logging order on fetches (#129)
    • add ability to get all rubicon-ml entities by name and ID (#128, #131, #133, #135, #141, #152, #153)
    • add storage_options passthru to prefect task (#155)

    bugfixes

    • fix local dataframe logging (#156)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.10(Aug 24, 2021)

    changelog

    • add passthrough for dash.Dash keyword arguments to the Dashboard (#117)

    bugfixes

    • get dashboard working behind Jupyter proxies (#116)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.9(Aug 19, 2021)

  • 0.2.8(Jul 22, 2021)

  • 0.2.7(Jul 8, 2021)

    changelog

    • log estimator parameters passed to fit in the Scikit-learn pipeline (#111)

    bugfixes

    • properly serialize date types when logging (#108)
    • properly serialize datetime types with null fields (#111)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.6(Jun 9, 2021)

    changelog

    • add runnable binder example (#99)

    bugfixes

    • check for root_dir before initializing in-memory filesystem (#104)
    • address whitesource vulnerability (#100)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.5(May 21, 2021)

  • 0.2.4(May 19, 2021)

  • 0.2.3(May 17, 2021)

  • 0.2.2(Apr 29, 2021)

    changelog

    • adds test suite for example notebooks (#90)

    bugfixes

    • ignore non-rubicon files within data directories (#84)
    • ignore non-rubicon files in async repo (#91)
    • fix pytest warnings (#92)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Apr 19, 2021)

    bugfixes

    • revisit examples (#79)
      • ensures all examples in the notebooks directory are working with the latest version of rubicon
      • the asynchronous S3 client can now read data back
      • the dashboard now works with an in-memory filesystem with a default root_dir
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Apr 12, 2021)

  • 0.1.8(Apr 7, 2021)

  • 0.1.7(Apr 5, 2021)

  • 0.1.6(Apr 1, 2021)

    changelog

    • support hiding cols within experiment table and comparison plot (#60)

    bugfixes

    • fixes a bug related to dataframe plotting using hvplot (#62)
    Source code(tar.gz)
    Source code(zip)
Owner
Capital One
We’re an open source-first organization — actively using, contributing to and managing open source software projects.
Capital One
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

ADB IP ROTATE This an Python script based on Android Debug Bridge (adb) shell sc

Dor Bismuth 2 Jul 12, 2022
An expansion for RDKit to read all types of files in one line

RDMolReader An expansion for RDKit to read all types of files in one line How to use? Add this single .py file to your project and import MolFromFile(

Ali Khodabandehlou 1 Dec 18, 2021
Telegram chatbot created with deep learning model (LSTM) and telebot library.

Telegram chatbot Telegram chatbot created with deep learning model (LSTM) and telebot library. Description This program will allow you to create very

1 Jan 04, 2022
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Keon Lee 279 Jan 04, 2023
Official repository for ABC-GAN

ABC-GAN The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversa

IgorSusmelj 10 Jun 23, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Nodule Generation Algorithm Baseline and template code for node21 generation track

Nodule Generation Algorithm This codebase implements a simple baseline model, by following the main steps in the paper published by Litjens et al. for

node21challenge 10 Apr 21, 2022
Object Detection using YOLO from PyImageSearch

Object Detection using YOLO from PyImageSearch By applying object detection, you’ll not only be able to determine what is in an image, but also where

Mohamed NIANG 1 Feb 09, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
Styled Handwritten Text Generation with Transformers (ICCV 21)

⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We

Ankan Kumar Bhunia 85 Dec 22, 2022
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
Welcome to The Eigensolver Quantum School, a quantum computing crash course designed by students for students.

TEQS Welcome to The Eigensolver Quantum School, a crash course designed by students for students. The aim of this program is to take someone who has n

The Eigensolvers 53 May 18, 2022
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
This repository gives an example on how to preprocess the data of the HECKTOR challenge

HECKTOR 2021 challenge This repository gives an example on how to preprocess the data of the HECKTOR challenge. Any other preprocessing is welcomed an

56 Dec 01, 2022
CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

tzt 49 Nov 17, 2022